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1.  Executive  Summary 

Well-calibrated  human-automation  trust  (HAT)  is  an  essential  ingredient  for  efficiency, 
communication,  and  safety  in  complex  human-automation  interactions.  A  dichotomy  between 
HAT  and  human-human  trust  (HHT)  has  been  proposed:  some  scholars  argue  that  HAT  and 
HHT  are  fundamentally  different  due  to  initial  perception  and  lack  of  intention  on  the  part  of 
automation,  while  others  claim  that  HAT  and  HHT  are  equal,  since  similar  social  interactions 
as  between  humans  can  be  elicited  when  automation  is  designed  to  be  human-like.  Although, 
recent  behavioral  research  has  provided  evidence  for  both  accounts  and  a  plethora  of  neural 
evidence  for  HHT  already  exists;  however,  the  underlying  neural  signatures  for  HAT  and  its 
relationship  to  HHT  are  still  unexplored.  Behavioral  measures  alone  are  unlikely  to  allow  one 
to  distinguish  between  HHT  and  HAT,  because  the  same  behavioral  outcome  can  be 
associated  with  very  different  underlying  neural  mechanisms.  Assessing  both  performance 
and  brain  function  can  provide  more  information  than  either  alone.  The  objective  of  this 
proposal  was  to  investigate  the  similarities  and  differences  of  the  neural  systems  of  HAT  and 
HHT  in  a  series  of  three  studies  that  combined  a  behavioral  X-ray  luggage-screening  task 
with  functional  magnetic  resonance  imaging  (fMRI)  and  manipulated  reliabilities  of  advice 
(unknown  to  the  participants)  as  the  key  feature  for  HAT  and  HHT  interactions.  Healthy 
participants  were  asked  to  search  for  knives  hidden  in  densely  cluttered  X-ray  images  of 
luggage  after  receiving  advice  (presence  or  absence  of  a  knife)  from  a  human  or  automated 
luggage  inspector  (framed  as  experts).  HAT  and  HHT  were  measured  as  the  acceptance  rates 
of  advice  either  giving  by  the  machine  or  human  agent.  By  adopting  a  comprehensive, 
interdisciplinary  research  program  including  scientists  from  social  cognitive  neuroscience, 
psychology,  and  human  factors,  we  accomplished  the  overall  objective  of  this  proposal  by 
pursuing  the  following  three  specific  aims: 

Aim  #1:  Neural  signatures  of  HAT  based  on  reliable  human-automation  interactions.  In 

study  1,  participants  performed  the  security  screening  task  and  decided  whether  to  search  or 
clear  the  luggage  after  receiving  advice  from  a  human  or  automated  luggage  inspector  with  a 
manipulated  reliability  of  90%.  HHT  was  initially  lower  than  HAT,  probably  due  to  the 
preconceived  notions  of  automation  being  perfect.  However,  over  time  differences  between 
HHT  and  HAT  disappeared  based  on  a  higher  degree  of  confidence  toward  the  human  adviser 
to  perform  the  task  based  on  the  received  feedback.  This  reinforcement  learning  process  was 
mirrored  by  activations  in  reward-sensitive  brain  regions,  including  the  dorsal  striatum  and 
ventromedial  prefrontal  cortex.  In  summary,  comparing  HHT  and  HAT  study  I  provided  the 
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first  neural  evidence  showing  how  automation  bias  mediates  these  types  of  trust,  thus  leading 
to  behavioral  differences  in  the  context  of  advice  taking. 

Aim  #2:  Neural  signatures  of  HAT  based  on  unreliable  buman-automation  interactions 
due  to  high  false  alarm  rates.  In  study  2,  participants  completed  the  X-ray  luggage¬ 
screening  task  by  either  rejecting  or  accepting  bad  or  good  advice  from  either  a  machine  or 
human  inspector  with  a  manipulated  reliability  of  60%  (false  alarm  rate).  Unreliable  advice 
decreased  performance  overall.  HHT  was  lower  than  HAT  during  bad  advice,  presumably  due 
to  reevaluation  of  expectations  arising  from  association  of  dispositional  credibility  for  each 
agent.  Trust  differences  engaged  brain  regions  associated  with  the  mentalizing  network  for 
evaluating  personal  characteristics  and  traits  (precuneus,  posterior  cingulate  cortex, 
temporoparietal  junction)  and  the  salience  network  for  interoception  (posterior  insula). 
Posterior  insula  and  left  precuneus  were  the  drivers  of  the  HHT  network  that  were 
reciprocally  connected  to  each  other  and  also  projected  to  all  other  regions.  In  summary,  study 
2  revealed  insights  into  the  neural  underpinnings  of  HAT  and  HHT  associated  with  unreliable 
advice  utilization  due  to  high  false  alarm  rates. 

Aim  #3:  Neural  Signatures  of  HAT  based  on  unreliable  human-automation  interactions 
due  to  high  miss  rates  (60%).  In  study  3,  participants  performed  the  X-ray  luggage¬ 
screening  task  by  either  accepting  or  rejecting  good  or  bad  advice  from  either  a  human  or  a 
machine  inspector  with  a  manipulated  reliability  60%  (miss  rate)  of.  HAT  decreased  more 
than  HAT  over  time,  possibly  due  to  high  expectations  of  reliable  advice  from  a  machine  and 
changes  in  attention  allocation  due  to  miss  errors.  Brain  areas  involved  with  the  salience  and 
mentalizing  networks,  as  well  as  sensory  processing  involved  with  attention  were  less  active 
for  HAT  as  for  HHT.  The  HAT  network  consisted  of  attentional  modulation  of  sensory 
information  with  the  lingual  gyrus  as  the  driver  during  the  decision  phase  and  the  fusiform 
gyrus  as  the  driver  during  the  feedback  phase  of  the  task.  In  summary,  study  3  expanded  on 
the  existing  literature  by  showing  how  misses  degrade  HAT  in  comparison  to  HHT,  which  is 
represented  in  brain  regions  involved  in  salience  detection  and  self-processing  with  perceptual 
integration. 

The  performed  studies  are  innovative,  because  they  were  among  the  first  directly  to  examine 
and  compare  the  neural  signatures  of  HAT  (and  its  relationship  to  HHT)  in  the  context  of 
human-automation  performance  applying  a  multi-disciplinary  approach.  The  findings  have 
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significant  implications  for  society  because  of  progressions  in  technology  and  increased 
interactions  with  machines.  Moreover,  those  findings  are  relevant  to  the  Air  Force  Office  of 
Scientific  Research’s  mission  aimed  at  fostering  innovative  research  and  enhancing  the  Air 
Force's  impact  on  policies  and  operations  related  to  national  security  by  investing  in  the 
discovery  of  the  foundational  concepts  of  trust  building  and  trust  calibration  during  complex 
human-machine  interactions.  Overall,  the  successful  completion  of  this  project  resulted  in  two 
substantive  project  outcomes:  first,  a  significant  increase  in  our  knowledge  about  the 
underlying  neural  circuits  of  HAT  calibration  during  complex  human-automation  interactions 
and  second,  the  laboratory  results  provide  a  methodology  and  rationale  for  exploring  HAT  in 
field  research  and  for  developing  transformative  novel  theories  and  models. 

2.  Personnel  Supported: 

PI:  Dr.  Frank  Krueger 

Co-PI:  Dr.  Raja  Parasuraman  passed  away  during  the  last  year  of  the  project. 

Graduate  student:  Kimberly  Goodyear 

3.  Publications: 

Findings  of  study  1  were  submitted  as  an  abstract  to  the  21st  Annual  Meeting  of  the  Cognitive 
Neuroscience  Society  (Boston,  MA;  April  5-8,  2014): 

Title:  How  automation  bias  influences  human-human  and  human-automation  trust:  An  fMRI  study 
Authors:  Goodyear  K,  Bowman  A,  Chernyak  S,  De  Visser  E,  Parasuraman  R,  Krueger  F. 

Findings  of  study  2  were  submitted  as  an  abstract  to  the  Society  for  Social  Neuroscience 
Annual  Meeting  (Chicago,  IL;  October  16,  2015): 

Title:  Comparisons  of  advice  utilization  during  human  and  machine  agent  interactions:  a  functional 
magnetic  resonance  imaging  and  effective  connectivity  study 

Authors:  Goodyear  K,  Parasuraman  R,  Chernyak  S,  Madhavan  P,  Deshpande  G,  Krueger  F. 

The  research  effort  for  this  project  culminated  in  the  production  of  one  dissertation.  In  April 
2006,  Kimberly  S.  Goodyear  will  defend  her  dissertation  entitled  “The  neural  basis  of  adviee 
utilization  During  human  and  machine  agent  interactions”  to  the  graduate  faculty  of  George 
Mason  University  in  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of 
Philosophy  Neuroscience.  The  dissertation  includes  the  findings  from  study  1  and  study  2  (see 
attachment).  The  PI  of  the  research  project  will  act  as  the  Dissertation  Director. 
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Moreover,  a  manuscript  entitled  ‘‘Advice  utilization  during  human  and  machine 
interactions:  an  fMRI  and  effective  connectivity  study”  based  on  the  findings  of  study  2  is 
currently  under  review  as  an  original  research  article  in  the  journal  “Frontiers  in  Human 
Neuroscience”: 

Authors:  Kimberly  Goodyear,  Raja  Parasuraman,  Sergey  Chernyak,  Poornima  Madhavan, 
Gopikrishna  Deshpande,  Frank  Krueger 

Author  Contributions:  K.G.  and  S.C.  acquired  the  data  for  analysis.  K.G.,  R.P.  and  F.K.  contributed 
to  the  conception  of  the  design.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  contributed  to  interpretation 
of  the  data.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  contributed  to  drafting  of  the  work  and  revising  it 
critically.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  approved  the  final  version  to  be  published.  K.G., 
R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  agreed  to  be  accountable  for  all  aspects  of  the  work. 

Abstract:  With  new  technological  advances,  advice  can  come  from  different  sources  such  as 
machines  or  humans,  but  how  individuals  respond  to  such  advice  and  the  neural  correlates  involved 
need  to  be  better  understood.  We  combined  functional  MRI  and  multivariate  Granger  causality 
analysis  with  an  X-ray  luggage- screening  task  to  investigate  the  neural  basis  and  corresponding 
effective  connectivity  involved  with  advice  utilization  from  agents  framed  as  experts.  Participants 
were  asked  to  accept  or  reject  good  or  bad  advice  from  a  human  or  machine  agent  with 
manipulated  reliability  (high  false  alarm  rate).  We  showed  that  unreliable  advice  decreased 
performance  overall  and  participants  interacting  with  the  human  agent  had  a  greater  depreciation  of 
advice  utilization  during  bad  advice.  These  differences  in  advice  utilization  can  be  due  to 
reevaluation  of  expectations  arising  from  association  of  dispositional  credibility  for  each  agent.  We 
demonstrated  that  differences  in  advice  utilization  engaged  brain  regions  associated  with  evaluation 
of  personal  characteristics  and  traits  (precuneus,  posterior  cingulate  cortex,  temporoparietal 
junction)  and  interoception  (posterior  insula).  We  found  that  the  right  posterior  insula  and  left 
precuneus  were  the  drivers  of  the  advice  utilization  network  that  were  reciprocally  connected  to 
each  other  and  also  projected  to  all  other  regions.  Our  behavioral  and  neuroimaging  results  have 
significant  implications  for  society  because  of  progressions  in  technology  and  increased 
interactions  with  machines. 

Finally,  another  manuscript  entitled  “An  fMRI  and  effective  connectivity  study  investigating 
miss  errors  during  advice  utilization  from  human  and  machine  agents”  based  on  the  findings 
of  study  3  is  currently  under  review  as  an  original  research  article  in  the  journal  “Social 
Neuroscience”: 

Authors:  Kimberly  Goodyear,  Raja  Parasuraman,  Sergey  Chernyak,  Ewart  de  Visser,  Poornima 
Madhavan,  Gopikrishna  Deshpande,  Frank  Krueger 
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Author  Contributions:  K.G.  and  S.C.  acquired  the  data  for  analysis.  K.G.,  R.P.  and  F.K.  contributed 
to  the  conception  of  the  design.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  contributed  to  interpretation 
of  the  data.  K.G.,  R.P.,  S.C.,  E.D.V.,  P.M.,  G.D.  and  F.K.  contributed  to  drafting  of  the  work  and 
revising  it  critically.  K.G.,  R.P.,  S.C.,  E.D.V.,  P.M.,  G.D.  and  F.K.  approved  the  final  version  to  be 
published.  K.G.,  R.P.,  S.C.,  E.D.V.,  P.M.,  G.D.  and  F.K.  agreed  to  be  accountable  for  all  aspects  of 
the  work. 

Abstract.  As  society  becomes  more  reliant  on  machines  and  automation,  understanding  how  people 
utilize  advice  is  a  necessary  endeavor.  Our  objective  was  to  reveal  the  underlying  neural 
mechanisms  during  advice  utilization  from  expert  human  and  machine  agents  with  fMRI  and 
multivariate  Granger  causality  analysis.  During  an  X-ray  luggage-screening  task,  participants 
accepted  or  rejected  good  or  bad  advice  from  either  the  human  or  machine  agent  framed  as  experts 
with  manipulated  reliability  (high  miss  rate).  We  showed  that  the  machine-agent  group  decreased 
their  advice  utilization  compared  to  the  human-agent  group  and  these  differences  in  behaviors 
during  advice  utilization  could  be  accounted  for  by  high  expectations  of  reliable  advice  and 
changes  in  attention  allocation  due  to  miss  errors.  Brain  areas  involved  with  the  salience  and 
mentalizing  networks,  as  well  as  sensory  processing  involved  with  attention,  were  recruited  during 
the  task  and  the  advice  utilization  network  consisted  of  attentional  modulation  of  sensory 
information  with  the  lingual  gyrus  as  the  driver  during  the  decision  phase  and  the  fusiform  gyrus  as 
the  driver  during  the  feedback  phase.  Our  findings  expand  on  the  existing  literature  by  showing  that 
misses  degrade  advice  utilization,  which  is  represented  in  a  neural  network  involving  salience 
detection  and  self-processing  with  perceptual  integration. 
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ABSTRACT 


THE  NEURAL  BASIS  OF  ADVICE  UTILIZATION  DURING  HUMAN  AND 
MACHINE  AGENT  INTERACTIONS 

Kimberly  S.  Goodyear,  Ph.D. 

George  Mason  University,  2016 

Dissertation  Director:  Dr.  Frank  Krueger 


Understanding  how  individuals  utilize  advice  from  humans  and  machines  has  become 
progressively  more  pertinent  as  technological  advances  have  pervaded  our  society.  With 
an  increasing  shift  towards  relying  on  automation,  the  necessity  to  understand  the 
complex  interactions  that  exist  between  humans  and  automation  has  emerged.  This  thesis 
examines  the  behavioral,  cognitive  and  neural  mechanisms  involved  with  advice 
utilization  from  human  and  machine  agents  framed  as  experts.  A  series  of  two  studies 
were  implemented  that  consisted  of  an  X-ray  luggage-screening  task  with  functional 
magnetic  resonance  imaging  and  effective  connectivity  analysis.  To  assess  advice  taking 
differences  between  human  and  machines  across  both  studies,  the  agents’  reliability  was 
manipulated  with  high  error  rates.  To  fully  ascertain  how  individuals  respond  to 
unreliable  advice,  the  focus  of  Chapter  Two  was  on  false  alarms,  while  in  Chapter  Three 
the  focus  was  on  misses.  In  each  study,  we  demonstrated  that  there  were  unique 


behavioral  responses  and  brain  activation  patterns,  but  in  both  studies  participant 
performance  levels  declined  overall.  In  Chapter  Two,  we  showed  that  participants 
interacting  with  the  human  agent  had  a  greater  depreciation  of  advice  utilization  during 
bad  advice  and  there  was  activation  in  brain  regions  associated  with  evaluation  of 
personal  characteristics,  traits  and  interoception.  In  addition,  the  effective  connectivity 
analysis  revealed  that  the  right  posterior  insula  and  left  precuneus  were  the  drivers  of  the 
network  that  were  reciprocally  connected  to  each  other  and  also  projected  to  all  other 
regions  (right  precuneus,  posterior  cingulate  cortex,  rostrolateral  prefrontal  cortex  and 
posterior  temporoparietal  junction).  In  Chapter  Three,  we  demonstrated  that  advice 
utilization  decreased  more  for  the  machine-agent  group  and  brain  areas  involved  with  the 
salience  and  mentalizing  networks,  as  well  as  sensory  processing  involved  with  attention, 
were  recruited  during  the  task.  The  effective  connectivity  analysis  showed  that  the 
lingual  gyrus  was  the  driver  during  the  decision  phase  that  projected  to  all  other  target 
regions  (anterior  cingulate  cortex,  precuneus  and  cuneus)  and  the  fusiform  gyrus  was  the 
driver  during  the  feedback  phase  that  sent  an  output  to  the  inferior  parietal  lobule.  The 
contribution  of  this  thesis  is  a  greater  comprehension  of  the  decision-making  processes 
involved  during  advice  taking,  which  may  serve  as  a  building  block  for  uncovering  the 


different  factors  involved  with  human-machine  interactions. 


CHAPTER  ONE:  GENERAL  INTRODUCTION 


The  prevalence  of  new  technology  in  our  society  today  has  created  an  increased 
reliance  on  automation  and  with  progressions  in  mechanization  and  automated  aids,  this 
allows  for  a  heuristic  to  decrease  human  workload  for  manual  labor  (Mosier,  Skitka, 
Heers,  &  Burdick,  1998;  Parasuraman  &  Riley,  1997).  For  example,  in  2013,  the  Federal 
Aviation  Administration  published  a  report  on  the  operational  use  of  flight  path 
management  systems  that  showed  that  pilot  interaction  with  automation  may  result  in 
overreliance  and  over  50%  of  accidents  reviewed  were  due  to  the  pilot’s  reduced 
situational  awareness  (Federal  Aviation  Administration,  2013).  With  this  shift  of  job 
roles  from  human  to  automation,  understanding  how  individuals  vary  in  response  to 
automation  has  become  more  pertinent  as  potential  issues  may  arise.  To  better 
comprehend  the  complex  nature  of  human-machine  interactions,  the  rest  of  this  chapter 
will  explore  advice  utilization  and  the  effect  of  errors  in  greater  detail. 

1.1  Advice  Utilization 

The  ways  in  which  individuals  respond  to  advice  can  vary  depending  on  different 
factors  involved  during  those  social  interactions.  For  example,  variables  such  as  source 
credibility  and  type  of  advice  can  influence  whether  a  person  utilizes  or  discounts  the 
advice  given  to  them.  Studies  have  demonstrated  that  expert  advice  is  used  more  than 
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novice  advice  (Sniezek,  Schrah,  &  Dalai,  2004)  and  poor  (inaceurate)  advice  is 
diseounted  more  than  good  (aecurate)  advice  (Yaniv  &  Kleinberger,  2000).  A  study 
investigating  perceptions  of  deeision  aids  revealed  that  measures  of  trust  varied 
depending  on  the  pedigree  (noviee  vs.  expert)  of  the  human  or  automated  aid  (Madhavan 
&  Wiegmann,  2007),  revealing  that  advice  acceptanee  between  humans  and  maehines 
differs  depending  on  source  eredibility.  Moreover,  the  authors  postulate  that  adviee 
utilization  strategies  for  humans  and  automation  may  differ  due  to  dispositional 
eredibility  and  high  expectations  of  reliable  adviee.  In  addition,  the  decision  to  accept  or 
rejeet  advice  may  be  influeneed  by  the  reliability  of  the  souree.  For  instance,  it  has  been 
shown  that  automation  eharaeteristies  such  as  reliability,  predictability  and  ability  can 
affect  how  people  respond  to  imperfeet  automation  (Lee  &  See,  2004).  Initial 
expeetations  of  reliable  adviee  can  be  altered  when  diseonfirmation  evidenee  about  an 
agent’s  credibility  is  revealed.  For  example,  a  study  demonstrated  that  initial 
eonfirmatory  experience  can  increase  how  mueh  a  person  follows  bad  advice,  whieh 
ultimately  impaets  decision-making  behaviors  (Staudinger  &  Buehel,  2013).  This 
phenomenon  can  be  explained  in  terms  of  an  expeetation  diseonfirmation  theory,  where 
an  expeetation  is  a  belief  that  someone  or  something  will  live  up  to  what  is  antieipated 
and  diseonfirmation  is  a  discrepancy  in  the  evaluation  of  that  expeetation  (Oliver,  1980). 

Neuroimaging  studies  investigating  advice  taking,  personal  traits,  dispositions  and 
human-robot  interactions  have  revealed  the  involvement  with  regions  associated  with  the 
salience,  mentalizing  and  eentral  exeeutive  networks  (Brosch,  Schiller,  Mojdehbakhsh, 
Uleman,  &  Phelps,  2013;  Chaminade  et  ah,  2012;  Kraeh  et  ah,  2008;  Suen,  Brown, 
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Morck,  &  Silverstone,  2014).  Menon  (2011)  proposed  a  model  involving  three  large- 
scale  brain  networks,  including  the  central  executive  network  (CEN),  the  salience 
network  (SN)  and  the  default-mode  network  (DMN);  the  CEN  (dorsolateral  PEC,  dlPFC) 
has  been  postulated  to  be  involved  with  higher-order  cognitive  functions  such  as 
decision-making,  the  SN  (dorsal  ACC,  Al)  has  been  implicated  in  saliency  detection  of 
internal  and  external  events  and  the  DMN  (PCC,  PreC)  has  been  revealed  to  be 
associated  with  self-processing  cognitions.  A  study  investigating  tracking  of  expertise 
for  humans  and  algorithms  found  areas  associated  with  the  mentalizing  network  and 
salience  networks  (e.g.,  ACC,  precuneus)  during  estimates  of  the  agents’  abilities 
(Boorman,  O'Doherty,  Adolphs,  &  Rangel,  2013).  In  addition,  studies  investigating 
observations  of  human  and  robot  interactions  (Suen  et  al.,  2014)  and  inferences  of  mental 
states  for  humans  and  machines  (Chaminade  et  al.,  2012),  as  well  as  studies  examining 
attribution  of  personal  traits  and  characteristics  (Cabanis  et  al.,  2013)  have  shown 
recruitment  of  areas  associated  with  large-scale  brain  networks.  Given  considerable 
overlap  between  the  aforementioned  neural  networks,  the  overall  aim  of  this  thesis  was  to 
investigate  the  underlying  mechanisms  involved  with  advice  taking  from  humans  and 
machines  to  provide  potential  evidence  about  how  individuals  perceive  and  utilize  advice 
from  different  agents. 

1.2  Errors 

To  provide  a  background  understanding  for  the  circumstances  in  which  a  person 
makes  decisions  during  levels  of  uncertainty  (i.e.,  unreliable  advice).  Signal  Detection 
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Theory  (SDT)  can  show  how  differences  in  advice  utilization  pertain  to  differences  in 
error  types  (Tanner  Jr  &  Swets,  1954).  To  measure  the  individual  responses  according  to 
performance  rates,  there  are  signal  absent  (correct  rejection  [correct  non-alert]),  false 
alarm  [incorrect  alert])  and  signal  present  (hit  [correct  alert],  miss  [incorrect  non-alert]) 
distributions.  Looking  at  the  different  error  types  (false  alarms  and  misses)  within  a 
decision  matrix  allows  for  an  even  greater  interpretation  of  the  factors  involved  with 
advice  utilization.  For  example,  it  has  been  shown  that  false  alarms  can  hurt  overall 
performance,  operator  compliance  (agreeing  when  the  aid  indicates  the  target  is  present) 
and  operator  reliance  (agreeing  when  the  aid  indicates  the  target  is  absent),  while  misses 
only  affect  operator  reliance  (Dixon,  Wickens,  &  McCarley,  2007;  Rice  &  McCarley, 
2011).  However,  there  are  conflicting  views  pertaining  to  the  overlap  between 
compliance  and  reliance,  which  warrants  further  exploration  on  the  topic  (Dixon  et  al., 
2007;  Meyer,  2004).  Moreover,  it  has  been  revealed  that  false  alarms  may  cause 
operators  to  not  respond  to  alerts  at  all,  which  has  been  coined  as  the  “cry  wolf  effect,” 
(Breznitz,  2013;  Wickens  et  al.,  2009)  and  furthermore,  misses  may  affect  monitoring 
strategies  during  non-alarm  periods  causing  a  change  in  attention  allocation  strategies 
(Onnasch,  Ruff,  &  Manzey,  2014). 

A  comprehensive  review  by  McBride,  Rogers,  and  Fisk  (2014)  determined  that 
management  of  automation  errors  can  be  broken  up  into  a  framework  of  four  variables: 
automation  characteristics  (e.g.,  reliability),  person  factors  (e.g.,  complacency),  tasks 
when  humans  and  automation  work  together  (e.g.,  automation-error  costs)  and  emergent 
factors  that  can  arise  during  interactions  (e.g.,  trust  in  automation).  Automation 
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characteristics  such  as  reliability  can  provide  valuable  insight  into  operator  response  and 
performanee  when  an  aid  performs  near  perfect  or  becomes  unreliable.  For  example,  if 
an  aid  has  high  reliability,  this  ean  lead  to  misuse,  or  overreliance  on  an  aid;  when  an  aid 
has  low  reliability,  this  can  lead  to  disuse,  or  ignoring  alerts  from  an  aid  or  disabling  its 
functions  (Parasuraman  &  Riley,  1997).  A  study  showed  that  the  optimal  reliability  to  be 
70%  and  anything  below  that  point  impairs  an  individual’s  performance,  demonstrating 
the  importanee  of  reliable  advice  (Wiekens  &  Dixon,  2007).  In  addition,  person  factors 
such  as  complacency  illustrate  how  individual  differenees  can  affect  the  use  of 
automation.  For  instance,  eomplaeency  can  occur  when  automation  performanee  is  near 
perfect  resulting  in  redueed  monitoring  and  vigilanee  (Parasuraman,  Molloy,  &  Singh, 
1993).  Previous  researeh  on  the  topic  indicates  that  varying  reliability  may  disrupt 
eomplaeency  (McBride  et  ah,  2014)  and  eomplaeent  behaviors  may  be  due  to  eonditions 
under  multiple-task  load  (Parasuraman  &  Manzey,  2010).  However,  the  measurement  of 
eomplaeency  and  how  it  is  defined  is  not  elearly  delineated  (Parasuraman  et  ah,  1993). 
Task  variables  sueh  as  automation-error  eonsequenees  and  accountability  can  reveal  how 
environmental  eontexts  influenee  teamwork  between  humans  and  automation.  For 
example,  accountability  in  pilot  cockpits  has  been  shown  to  be  higher  when  the 
aeeountability  is  internalized  (Mosier  et  ah,  1998)  and  performanee  aeeountability  can 
lead  to  less  automation  bias  (Skitka,  Mosier,  &  Burdick,  2000).  Lastly,  emergent  faetors 
such  trust  in  automation  or  mental  workloads  are  eomponents  that  ean  alter  how  an 
individual  manages  errors.  A  study  by  Merritt,  Heimbaugh,  LaChapell,  and  Lee  (2013) 
investigated  trust  towards  automation  with  an  X-ray  luggage-sereening  task  and  the 
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authors  concluded  that  implicit  attitudes  significantly  predicted  automation  trust. 
Furthermore,  it  has  be  revealed  that  appropriately  ealibrating  operator  trust  ean  mitigate 
any  potential  issues  that  can  arise  during  human-automation  interaetions  (Lee  &  See, 
2004)  and  relative  trust  may  be  an  essential  faetor  involved  with  a  framework  for 
automation  use  (Dzindolet,  Beek,  Pierce,  &  Dawe,  2001).  Studies  investigating 
automation  use  have  demonstrated  that  there  are  many  different  variables  eontributing  to 
how  individuals  manage  errors  from  automation. 

Furthermore,  brain  aetivity  in  response  to  error  monitoring  and  processing  has 
been  measured  with  fMRI  as  well  as  event-related  potential  (ERP).  For  instance,  a  study 
examining  error  monitoring  during  a  Go/NoGo  task  with  fMRI  and  ERP  eorrelations 
demonstrated  that  error  and  confliet  monitoring  both  show  involvement  with  distinet 
ACC  regions  (Mathalon,  Whitfield,  &  Ford,  2003).  The  ACC  has  been  shown  to  be 
involved  with  a  wide  range  of  eognitive  funetions  involving  decision-making  and 
attention  (Bush,  Luu,  &  Posner,  2000),  as  well  as  error  deteetion  and  performance 
monitoring  (Carter  et  ah,  1998;  Kiehl,  Liddle,  &  Hopfmger,  2000).  Shenhav,  Botviniek, 
and  Cohen  (2013)  postulated  that  dorsal  ACC  functionality  is  based  on  a  model  of 
expeeted  value  of  eontrol  that  integrates  expeeted  payoffs  and  rewards.  Moreover, 
eortical  activity  in  sensory  brain  areas  in  response  to  prediction  errors  has  also  been 
examined  (Hesselmann,  Sadaghiani,  Friston,  &  Kleinschmidt,  2010).  A  study  measuring 
eortical  activity  in  response  to  signal  deteetion  categories  revealed  that  false  alarms 
evoked  more  eortical  activity  than  misses,  which  may  be  due  to  individual  pereeptions 
involved  with  eaeh  type  of  error  (Ress  &  Heeger,  2003).  There  is  extensive  evidenee  that 
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the  ACC  is  involved  with  error  monitoring  and  that  there  are  perceptual  differences 
involved  with  each  error  type;  however,  the  neural  basis  associated  with  error  processing 
during  unreliable  advice  from  human  and  machine  agents  have  not  been  determined  and 
thus  warrants  further  examination. 

1.3  Overview  of  the  Studies 

The  purpose  of  the  studies  was  to  examine  how  errors  moderate  advice  utilization 
when  comparing  humans  and  machines  by  revealing  the  behavioral  and  neural 
mechanisms  associated  with  advice  taking.  Furthermore,  the  relevance  of  the  research 
provides  insight  into  the  numerous  factors  that  can  influence  advice  utilization  by 
investigating  decision-making  processes  in  conjunction  with  functional  magnetic 
resonance  imaging  (fMRI)  and  effective  connectivity  analysis.  Recent  behavioral 
research  has  provided  evidence  for  advice  utilization  differences  between  humans  and 
machines;  however,  the  underlying  neural  mechanisms  involved  with  human-machine 
interactions  remains  to  be  explored.  In  both  studies,  participants  partook  in  an  X-ray 
luggage- screening  task  where  they  received  good  and  bad  advice  from  either  a  machine 
or  human  agent  framed  as  an  expert,  made  decisions  to  accept  or  reject  the  advice  and 
then  received  feedback  if  their  decision  was  correct  or  incorrect.  Based  upon  previous 
findings  that  false  alarms  degrade  trust  and  hurt  overall  performance  more  than  misses 
(Dixon  et  al.,  2007),  we  aimed  to  reveal  the  influence  of  bad  advice  on  decision-making 
processes  by  manipulating  agent  reliability  with  different  error  types  (false  alarms, 
misses).  Specifically,  in  both  studies,  the  reliability  of  the  agents  was  60%  (40%  errors). 
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but  in  Chapter  Two,  the  focus  was  on  false  alarms  while,  in  Chapter  Three,  the  focus  was 
on  misses.  We  expected  performance  and  advice  utilization  to  be  lower  in  Chapter  Two 
compared  to  Chapter  Three  due  to  the  differences  in  error  types.  In  addition,  we 
expected  that  errors  would  decrease  overall  performance  for  both  studies  and  that  this 
would  ultimately  lead  to  degradation  of  advice  utilization.  The  differences  in  advice 
utilization  would  be  further  highlighted  when  comparing  the  human  agent  to  the  machine 
agent  due  to  factors  such  as  expectations  of  reliable  advice,  agent  performance  and 
dispositional  credibility  associated  with  each  agent.  Lastly,  we  expected  that  brain 
regions  corresponding  with  the  default-mode  network  (e.g.,  TPJ,  PreC)  and  the  salience 
network  (e.g,  AI,  ACC)  to  be  recruited  during  these  studies  due  to  violations  of 
expectations  stemming  from  unreliable  advice,  salience  detection  of  errors  and  attribution 
of  dispositional  credibility  for  each  agent. 
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CHAPTER  TWO:  THE  IMPACT  OF  FALSE  ALARMS  ON  ADVICE 

UTILIZATION 


2.1  Abstract 

With  new  technological  advances,  advice  can  come  from  different  sources  such  as 
machines  or  humans,  but  how  individuals  respond  to  such  advice  and  the  neural 
correlates  involved  need  to  be  better  understood.  We  combined  functional  MRI  and 
multivariate  Granger  causality  analysis  with  an  X-ray  luggage-screening  task  to 
investigate  the  neural  basis  and  corresponding  effective  connectivity  involved  with 
advice  utilization  from  agents  framed  as  experts.  Participants  were  asked  to  accept  or 
reject  good  or  bad  advice  from  a  human  or  machine  agent  with  manipulated  reliability 
(high  false  alarm  rate).  We  showed  that  unreliable  advice  decreased  performance  overall 
and  participants  interacting  with  the  human  agent  had  a  greater  depreciation  of  advice 
utilization  during  bad  advice.  These  differences  in  advice  utilization  can  be  due  to 
reevaluation  of  expectations  arising  from  association  of  dispositional  credibility  for  each 
agent.  We  demonstrated  that  differences  in  advice  utilization  engaged  brain  regions 
associated  with  evaluation  of  personal  characteristics  and  traits  (precuneus,  posterior 
cingulate  cortex,  temporoparietal  junction)  and  interoception  (posterior  insula).  We 
found  that  the  right  posterior  insula  and  left  precuneus  were  the  drivers  of  the  advice 
utilization  network  that  were  reciprocally  connected  to  each  other  and  also  projected  to 
all  other  regions.  Our  behavioral  and  neuroimaging  results  have  significant  implications 
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for  society  because  of  progressions  in  technology  and  increased  interactions  with 
machines. 
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2.2  Introduction 


Individuals  often  encounter  situations  in  their  everyday  lives  when  they  must  rely  on 
advice  from  others.  With  new  technological  advances,  advice  can  come  from  not  only 
humans,  but  also  automated  devices  such  as  a  Global  Positioning  System.  For  instance, 
to  provide  advanced  safety  measures,  the  Transportation  Safety  Administration  (TSA) 
has  implemented  X-ray  luggage  scanners  and  Advanced  Imaging  Technology  (AIT)  for 
screening  passengers  and  exposing  potential  security  threats  (Transportation  Safety 
Administration,  2014).  Numerous  factors  can  alter  the  valuation  of  advice,  such  as  self- 
confidence  (Bonaccio  &  Dalai,  2006;  Lee  &  Moray,  1992;  Riley,  1996),  user  trust  (P. 
Madhavan  &  D.  A.  Wiegmann,  2007b;  Mayer,  Davis,  &  Schoorman,  1995;  Rotter,  1967), 
source  credibility  (i.e.,  expert)  (Bimbaum  &  Stegner,  1979;  Madhavan  &  Wiegmann, 
2007a;  Van  Swol  &  Sniezek,  2005)  and  source  reliability/performance  (Bonaccio  & 
Dalai,  2006).  Understanding  how  people  utilize  advice  is  becoming  necessary  to  provide 
useful  insight  for  developing  safety  measures  and  for  appropriate  guidelines  to  predict 
human  behaviors. 

Individuals  may  vary  in  how  they  respond  to  advice  and  studies  have  shown  that 
expert  advice  is  more  frequently  used  (Sniezek,  Schrah,  &  Dalai,  2004)  and  more 
persuasive  than  novice  advice  (Jungermann,  Fischer,  Betsch,  &  Haberstroh,  2005).  In 
addition,  people  may  respond  to  advice  from  automation  and  humans  in  similar  ways 
under  the  premise  of  a  "perfect  automation  schema,"  in  which  an  individual  believes  that 
automated  aids  are  near  perfect  (Dzindolet,  Pierce,  Beck,  &  Dawe,  2002).  Moreover, 
factors  such  as  dispositional  credibility  can  alter  trust  between  human  and  machine 
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advisors  due  to  differences  in  personal  traits  such  as  loyalty  or  benevolence.  For 
example,  it  has  been  postulated  that  association  of  dispositional  credibility  is  higher  for 
human  agents  due  to  evaluation  of  personal  traits,  while  automated  agents  may  be  judged 
more  by  performance  levels  (Madhavan  &  Wiegmann,  2007a).  However,  when 
expectations  of  reliable  advice  are  altered  due  to  disconfirmation  evidence  about  an 
advisor’s  credibility,  decision-making  behaviors  can  be  impacted.  For  example, 
consistent  with  disconfirmation  theory  (Oliver,  1980)  decision-making  can  be  affected  by 
initial  confirmatory  experiences,  which  can  be  influenced  by  bad  advice  (Staudinger  & 
Buchel,  2013). 

Despite  existing  knowledge  of  the  cognitive  processes  that  affect  advice  taking, 
the  neural  mechanisms  and  the  underlying  effective  connectivity  network  involved  with 
good  and  bad  advice  from  human  and  machine  agents  framed  as  experts  remains  to  be 
explored.  Recent  neuroimaging  studies  have  investigated  the  role  of  expert  advice  during 
decision-making  (Boorman,  O'Doherty,  Adolphs,  &  Rangel,  2013;  Meshi,  Biele,  Korn,  & 
Heekeren,  2012),  social  learning  (Biele,  Rieskamp,  Krugel,  &  Heekeren,  2011; 

Staudinger  &  Buchel,  2013)  and  disobedience  (Suen,  Brown,  Morck,  &  Silverstone, 
2014).  Furthermore,  the  neural  activity  involved  with  assigning  trait  and  intentions  to 
others  (Mitchell,  Macrae,  &  Banaji,  2006;  Saxe  &  Kanwisher,  2003),  self-attributional 
processes  (Cabanis  et  ah,  2013),  as  well  as  human-robot  interactions  during  an  interactive 
rock-paper-scissors  game  (Chaminade  et  ah,  2012)  and  during  observations  of  social 
interactions  (Wang  &  Quadflieg,  2015)  have  been  investigated.  Overall,  key  regions 
associated  with  the  default  network  (e.g.,  temporoparietal  junction,  precuneus,  posterior 
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cingulate  cortex,  medial  prefrontal  cortex)  and  the  salience  network  (dorsal  anterior 
eingulate  cortex,  bilateral  insulae)  have  been  identified  in  playing  a  role  during  advice 
taking,  evaluation  of  personal  traits  and  during  human-robot  interactions  (Engelmann, 
Capra,  Noussair,  &  Bems,  2009;  Krach  et  al.,  2008). 

We  aimed  to  elueidate  the  neural  basis  of  advice  utilization  from  different  agents 
and  the  eorresponding  effective  eonnectivity  in  the  underlying  brain  network  by 
eombining  an  X-ray  luggage-screening  task  and  functional  magnetic  resonance  imaging 
(fMRl)  with  multivariate  Granger  causality  analysis.  The  foeus  of  this  study  was  to 
examine  the  impaet  of  false  alarms  on  adviee  taking  behaviors  based  on  previous 
evidenee  that  false  alarms  degrade  trust  and  hurt  overall  performance  more  than  misses 
(Dixon,  Wickens,  &  McCarley,  2007).  On  the  behavioral  level,  we  hypothesized  that 
unreliable  advice  would  deerease  performance  (i.e.,  accuracy)  and  advice  utilization  due 
to  disconfirming  evidence  about  the  agents’  perceived  expertise.  We  further  assumed 
that  advice  utilization  would  deerease  more  during  bad  advice  due  to  diseonfirmation 
evidenee  stemming  from  advice-incongruent  experienees  (i.e.,  high  false  alarm  rates) 
(Dixon  et  al.,  2007)  and  also  over  time  as  errors  beeame  more  apparent  due  to 
participants’  reevaluation  of  the  agent’s  performance  (Skitka,  Mosier,  &  Burdiek,  2000). 
In  addition,  we  expeeted  that  advice  utilization  would  deerease  more  for  the  machine 
agent  compared  to  the  human  agent  due  to  differenees  in  dispositional  credibility  between 
humans  and  machines  (Madhavan  &  Wiegmann,  2007a).  On  the  neural  level,  we  first 
predicted  activation  differences  in  brain  regions  assoeiated  with  attribution  of  personal 
traits  and  dispositions  (Brosch,  Schiller,  Mojdehbakhsh,  Uleman,  &  Phelps,  2013;  Harris, 
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Todorov,  &  Fiske,  2005).  Secondly,  when  comparing  the  human  to  the  machine  agent 
during  bad  advice  over  time,  we  expected  regions  such  as  the  precuneus  and  posterior 
cingulate  cortex  to  be  the  drivers  of  the  underlying  advice  utilization  network. 

2.3  Methods 
Subjects 

Three  studies  were  conducted  according  to  the  ethical  guidelines  and  principles  of  the 
Declaration  of  Helsinki.  For  the  normative  rating  study,  twenty-three  male  students  (age 
(M±  SD)  =  24.0  ±  2.6)  from  George  Mason  University  (GMU)  participated  to 
standardize  the  X-ray  luggage  images  for  the  experimental  studies.  For  the  behavioral 
study,  ten  volunteers  (6  males,  4  females;  age  =  22.3  ±  2.9)  participated  to  complete  an 
X-ray  luggage- screening  task  without  receiving  advice.  For  the  fMRl  study,  twenty-four 
healthy  right-handed  volunteers  (13  males,  1 1  females;  age  =  20.0  ±  2.6)  determined  by 
the  Edinburgh  Handedness  Inventory  (Right-handedness:  94.5  ±  7.7)  (Oldfield,  1971) 
participated  in  the  X-ray  luggage-screening  task  while  receiving  advice  either  from  a 
human  or  machine  agent.  All  participants  gave  written  consent  approved  by  GMU’s 
Institutional  Review  Board  and  received  financial  compensation  for  their  participation. 

Stimuli 

During  the  normative  rating  study,  the  participants  rated  320  X-ray  images  based  on  three 
dimensions  — clutter  (4.1  ±  0.3),  general  difficulty  (3.5  ±  0.4),  and  confidence  in  finding 
the  target  (3.2  ±  0.6) —  based  on  7-point  Likert  scales  (1  =  very  low  to  7  =  very  high) 
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(Madhavan  &  Gonzalez,  2006).  From  those  images,  64  (32  target  and  32  non-target) 
images  were  chosen  for  the  experimental  studies  based  on  the  standardized  ratings 

(Appendix  A.  la). 

X-ray  Luggage-Screening  Task 

In  the  X-ray  luggage-screening  task,  participants  were  asked  to  search  for  the  presence  or 
absence  of  a  knife.  In  the  behavioral  study,  participants  did  not  receive  advice  and 
performed  the  task  unassisted;  participants  in  the  fMRI  study  received  good  (advice- 
congruent)  or  bad  (advice-incongruent)  advice  from  either  a  human  or  machine  agent. 

For  both  studies,  the  reliability  was  set  to  60%  -  good  advice:  50%  hits  (correct  alerts) 
and  10%  correct  rejections  (correct  non-alerts);  bad  advice:  40%  false  alarms  (incorrect 
alerts)  (Appendix  A.lb). 

On  each  trial,  the  participants  saw  a  set  of  phases  including  a  fixation  cross  (0.5 
s),  advice  from  one  of  the  agents  to  “search”  or  “clear”  the  bag  (2  s),  an  image  of  the  X- 
ray  luggage  (4  s),  a  decision  to  accept  or  reject  the  advice  of  the  agent  to  “search”  or 
“clear”  the  bag  (4  s),  jitter  (~4  s),  feedback  indicating  if  their  decision  was  correct  or 
incorrect  (2.0  s)  and  lastly,  jitter  (~4  s).  The  jitter  times  were  generated  by  an  fMRI 
simulator  software  (http://www.mccauslandcenter.sc.edu/CRNL/tools/fmrisim)  that 
optimized  the  timing  and  consisted  of  a  minimum  of  1  seconds  and  average  of  4  seconds 
(Appendix  A.lc).  Participants  used  response  pads  to  respond  and  they  were  given  an 
initial  endowment  of  $40  and  each  incorrect  answer  resulted  in  a  deduction  of  $0.30  from 
the  remaining  total.  Performance,  advice  utilization,  response  times,  and  monetary 
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deductions  were  collected  during  the  experiment.  The  stimuli  were  presented  using  E- 
Prime  2.0  (Psychology  Software  Tools,  Inc.,  http://www.pstnet.com/eprime.cfm). 

Procedure 

Pre-Experimental  Phase.  The  participants  came  one  to  two  weeks  before  the  fMRl 
experiment  to  complete  self-report  questionnaires  as  control  measures  to  investigate 
individual  differences  between  the  agent  groups.  The  control  measures  included: 
Interpersonal  Reactivity  Index  (IRl,  separate  facets  of  empathy)  (Davis,  1983), 
Complacency-Potential  Rating  Scale  (CPS,  feelings  towards  automation)  (Singh,  Molloy, 
&  Parasuraman,  1997),  National  Readiness  Technology  Scale  (NTRS,  embracing  new 
technologies)  (Parasuraman,  2000),  NEO  Five-Factor  Inventory  (NEO-FFI,  personality 
styles)  (Costa  &  McCrae,  1992),  and  Propensity  to  Trust  (PTT,  trust  towards  automation) 
(Merritt,  Heimbaugh,  LaChapell,  &  Lee,  2013). 

Experimental  Phase.  Before  participants  completed  a  practice  run  for  the  fMRI 
experiment,  they  read  descriptions  about  the  human  or  machine  agents  (reliability  was 
not  disclosed)  (Appendix  A.2).  They  were  then  asked  to  rate  their  trust  in  and  reliability 
of  the  human  or  machine  agent  on  a  10-  point  Likert  scale  (0  =  very  low,  10  =  very  high). 
During  the  four  trials  of  the  practice  run,  participants  familiarized  themselves  with  the  X- 
ray  luggage-screening  task  and  the  five  possible  knives  that  could  be  present  in  the  bags. 
The  participants  then  completed  two  runs  of  the  experimental  task  while  in  the  scanner 
and  afterwards  they  were  again  asked  to  rate  reliability  and  trust. 
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Post-Experimental  Session.  After  the  fMRI  experiment,  participants  were  asked  to  rate 
their  confidence  in  finding  the  target  (i.e.,  knife)  in  each  of  the  X-ray  luggage  images 
presented  during  the  fMRI  experiment  on  a  10-point  Likert  scale  (1  =  very  low,  10  =  very 
high). 

fMRI  Data  Acquisition 

Imaging  data  were  acquired  on  a  3  T  head-unit  only  scanner  (Siemens  Allegra)  with  a 
circularly  polarized,  transmit/receive  head  coil  at  the  Krasnow  Institute  for  Advanced 
Study,  GMU,  Virginia.  The  anatomical  imaging  data  were  based  on  a  3D  T1  weighted 
MPRAGE  sequence  with  TR  =  2300  ms,  TE  =  3.37  ms,  flip  angle  =  7°,  slice  thickness  = 

1  mm,  voxel  dimension  =  1  mm  x  1  mm  x  1  mm  and  number  of  slices  =160.  The 
functional  imaging  data  were  based  on  a  2D  gradient-echo  EPI  sequence  with  TR  =  2000 
ms,  TE  =  30  ms,  flip  angle  =  70°,  slice  thickness  =  3  mm,  voxel  dimensions  =  3  mm  x  3 
mm  X  3  mm,  number  of  slices  =  33  per  volume  in  an  axial  orientation  parallel  to  the 
anterior-posterior  commissure.  The  first  two  volumes  were  discarded  to  allow  for  T1 
equilibrium  effects  and  a  total  of  330  volumes  were  taken  for  each  run. 

Behavioral  Data  Analysis 

Behavioral  data  analysis  was  carried  out  by  Statistical  Package  for  the  Social  Sciences 
20.0  (SPSS  20.0,  IBM  Corp.)  with  alpha  set  iop<  .05  (two-tailed).  Data  were  normally 
distributed  (Kolmogorov-Smimov  test)  and  assumptions  for  analyses  of  variance 
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(Bartlett’s  test)  were  not  violated.  We  first  investigated  task  performance  (i.e.,  accuracy) 
between  the  agent  groups  and  the  no  agent  group  by  employing  one-way  analysis  of 
variance  (ANOVA)  with  Agent  (human,  machine,  no  agent)  as  the  between-subjects 
factor.  Next,  we  looked  at  advice  utilization,  response  times  and  monetary  deductions 
with  mixed  2x2x2  repeated-measures  ANOVAs  with  Advice  (good,  bad)  and  Time 
(run  1,  run  2)  as  within-subjects  factors  and  Agent  (human,  machine)  as  the  between- 
subjects  factor.  In  addition,  we  investigated  reliability,  trust  and  confidence  ratings  with 
mixed  2x2  repeated-measures  ANOVAs  with  Time  (pre,  post)  as  the  within-subjects 
factor  for  the  reliability/trust  ratings  and  Target  (yes,  no)  as  the  within-subjects  factor  for 
the  confidence  ratings  and  with  Agent  (human,  machine)  as  the  between-subjects  factor. 
Lastly,  we  performed  bivariate  Spearman’s  correlations  to  identify  associations  between 
behavioral  and  control  measures  as  well  as  independent  t-tests  between  the  agent  groups 
to  investigate  group  differences. 

fMRI  Data  Analysis 

The  fMRI  data  analysis  was  carried  out  using  NeuroElf  software  (http://neuroelf.net)  and 
Brain  Voyager  QX  2.8  (Brain  Innovation).  The  functional  imaging  data  were 
preprocessed  using  Statistical  Parametric  Mapping  8  (SPM8,  Wellcome  Department  of 
Cognitive  Neurology)  functions  batched  via  NeuroElf,  including  three-dimensional 
motion  correction  (six  parameters),  slice-scan  time  correction  (temporal  interpolation) 
and  a  mean  functional  image  was  computed  for  each  participant  across  all  runs.  The 
mean  functional  image  was  then  co-registered  with  the  anatomical  images  using  a  joint- 
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histogram  for  the  different  contrast  types.  Preprocessing  of  the  anatomical  images 
included  segmenting  images  with  a  unified  segmentation  procedure  (Ashbumer  & 
Friston,  2005)  and  spatial  warping  were  applied  to  the  functional  data  to  normalize  the 
data  to  a  standard  Montreal  Neurological  Institute  (MNI)  brain  template.  Lastly,  spatial 
smoothing  (Gaussian  filter  of  6  mm  FWHM)  was  applied  to  the  images  to  account  for 
any  residual  differences  across  participants.  A  general  linear  model  (GLM)  that  was 
corrected  for  first-order  serial  correlations  was  performed  (Friston,  Harrison,  &  Penny, 
2003).  The  GLM  consisted  of  thirty-six  regressors  based  on  advice  utilization  (accept, 
reject)  separated  by  advice  (good,  bad)  and  time  (run  1,  run  2)  for  each  of  the  five  phases 
(fixation,  advice,  bag,  decision,  feedback)  on  each  trial  of  the  X-ray  luggage-screening 
task  and  six  parametric  regressors  of  no  interest  for  the  3D  motion  correction 
(translations  in  X,  Y,  Z  directions,  rotations  around  X,  Y,  Z  axes).  The  regressor  time 
courses  were  adjusted  for  the  hemodynamic  response  delay  by  convolution  with  a  dual¬ 
gamma  canonical  hemodynamic  response  function  (Buchel,  Holmes,  Rees,  &  Friston, 
1998).  Random-effect  analyses  were  performed  at  the  multi-subject  level  to  explore 
brain  regions  associated  with  the  decision  and  feedback  phases. 

To  reveal  brain  activations  associated  with  advice  utilization,  mixed  2x2x2 
ANOVAs  on  parameter  estimates  were  applied  with  Advice  (good,  bad)  and  Time  (run  1, 
run  2)  as  within-subjects  factors  and  Agent  (human,  machine)  as  the  between-subjects 
factor.  For  the  fMRI  results,  our  main  focus  was  on  brain  activations  during  the  decision 
and  feedback  phases  for  the  three-way  interaction  since  our  a  priori  hypotheses  was 
based  on  the  interaction  of  three  factors  (advice,  time,  agent)  (see  Appendix  A.3  for  main 
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effects  for  the  decision  and  feedback  phases).  Activations  for  the  decision  and  feedback 
phases  were  reported  after  correcting  for  multiple  comparisons  using  a  cluster-level 
statistical  threshold  (Cluster-level  Statistical  Threshold  Estimator  plugin  in  BrainVoyager 
QX),  which  calculates  the  minimum  cluster  size  to  achieve  a  false  activation  probability 
{a  =  0.05)  (Forman  et  al.,  1995;  Goebel,  Esposito,  &  Formisano,  2006).  The  voxel-level 
threshold  was  set  at  /?  <  .005  (uncorrected)  and  the  thresholded  map  was  used  for  a 
whole-brain  correction  criterion  based  on  the  estimate  of  the  map’s  spatial  smoothness 
and  on  an  iterative  procedure  (Monte  Carlo  simulation,  1,000  iterations).  The  activation 
clusters  were  displayed  in  MNI  space  on  an  anatomical  brain  template  reversed  left  to 
right. 

Effective  Connectivity  Analysis 

Investigation  of  the  effective  (or  directional)  brain  connectivity  in  the  network  of 
activated  brain  regions  was  performed  through  multivariate  Granger  causality  analysis 
(GCA)  using  a  custom  MATLAB  (www.mathworks.com)  code  as  previously  described 
by  Grant  et  al.  (2014),  Kapogiannis,  Deshpande,  Krueger,  Thornburg,  and  Grafman 
(2014)  and  Lacey,  Stilla,  Sreenivasan,  Deshpande,  and  Sathian  (2014).  Granger  causality 
is  based  on  a  temporal  precedence  concept  (Granger,  1969)  that  can  be  applied  to 
multivariate  effective  connectivity  modeling  of  ROI  (region  of  interest)  time  courses  to 
predict  directional  influences  among  brain  regions  (Deshpande,  LaConte,  James,  Peltier, 
&  Hu,  2009;  Friston  et  al.,  2003;  Preusse,  van  der  Meer,  Deshpande,  Krueger,  & 
Wartenburger,  2011;  Roebroeck,  Formisano,  &  Goebel,  2005;  K.  Sathian  et  al.,  2011; 
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Strenziok  et  al.,  2010).  The  model  examines  the  relationship  of  variables  in  time,  such 
that  given  two  variables,  a  and  b,  if  past  values  of  a  better  predict  the  present  value  of  b, 
then  causality  between  the  variables  can  be  inferred  as  function  of  their  earlier  time 
points  (Hampstead  et  ah,  201 1;  Krueger,  Landgraf,  van  der  Meer,  Deshpande,  &  Hu, 
2011;  Roebroeck  et  ah,  2005).  GCA  is  advantageous  for  application  of  effective 
connectivity  since  it  is  a  data-driven  approach  and  there  is  no  requirement  for  pre¬ 
specified  connectivity  models  like  dynamic  causal  modeling  (DCM)  (Deshpande  &  Hu, 
2012;  Deshpande  et  ah,  2009;  Deshpande,  Sathian,  Hu,  &  Buckhalt,  2012;  Friston  et  ah, 
2003;  Roebroeck  et  ah,  2005).  Recent  GCA  investigations,  including  experimental 
applications  (Abler  et  ah,  2006)  as  well  as  simulations  (Deshpande,  Sathian,  &  Hu, 

2010b;  Wen,  Rangarajan,  &  Ding,  2013),  have  shown  its  advantages  and  validity  for 
assessing  effective  connectivity. 

Based  upon  on  effective  connectivity  hypotheses,  only  those  regions  that  survived 
the  fMRI  analysis  threshold  for  the  interaction  effect  Advice  (good,  bad).  Time  (run  1, 
run  2),  and  Agent  (human,  machine)  for  the  decision  phase  were  selected  as  ROIs  for  the 
subsequent  multivariate  GCA.  Time  series  of  the  BOLD  (blood-oxygen-level-dependent) 
signal  for  the  selected  ROIs  were  extracted  around  peak  activation  maxima  (sphere  of  6  x 
6x6  mm  ),  averaged  across  voxels  and  normalized  across  participants,  per  run.  Blind 
hemodynamic  deconvolution  of  the  mean  ROI  BOLD  time  series  was  performed  using  a 
Cubature  Kalman  fdter,  which  has  been  shown  to  be  extremely  efficient  for  jointly 
estimating  latent  neural  signals  and  the  spatially  variable  hemodynamic  response 
functions  (HRFs)  (Havlicek,  Friston,  Jan,  Brazdil,  &  Calhoun,  2011).  In  addition,  recent 
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research  has  shown  that  this  model  is  not  susceptible  to  over-fitting  and  produces 
estimates  that  are  comparable  to  non-parametric  methods  (Sreenivasan,  Havlicek,  & 
Deshpande,  2015).  Hemodynamic  deconvolution  removes  the  inter-subject  and  inter¬ 
regional  variability  of  the  HRF  (Handwerker,  Ollinger,  &  D'Esposito,  2004)  as  well  as  its 
smoothing  effect  and  therefore,  increases  the  effective  temporal  resolution  of  the  signal. 
The  resulting  latent  neural  signals  were  entered  into  a  first  order  dynamic  multivariate 
autoregressive  (dMVAR)  model  for  assessing  directed  interactions  between  multiple 
nodes  as  a  function  of  time  (Grant,  Wood,  Sreenivasan,  Wheelock,  &  White,  2015; 
Hutcheson  et  al.,  2015;  Wheelock  et  al.,  2014))  while  factoring  out  influences  mediated 
indirectly  in  the  set  of  selected  ROIs  (Deshpande,  Hu,  Stilla,  &  Sathian,  2008; 
Deshpande,  Sathian,  &  Hu,  2010a;  Stilla,  Deshpande,  LaConte,  Hu,  &  Sathian,  2007).  A 
first  order  model  was  implemented  because  of  the  interest  in  causal  influences  arising 
from  neural  delays,  which  are  less  than  a  TR  (Deshpande,  Libero,  Sreenivasan, 
Deshpande,  &  Kana,  2013).  Furthermore,  the  dMVAR  model’s  coefficients  were 
allowed  to  vary  as  a  function  of  time  to  obtain  condition-specific  connectivity  values  (K 
Sathian,  Deshpande,  &  Stilla,  2013). 

Granger  connectivity  (GC)  path  weights  for  conditions  of  interest  (bad  advice)  for 
each  agent  (human,  machine)  were  extracted.  Those  corresponding  GC  path  weights 
were  populated  into  two  samples  and  independent  samples  t-tests  were  employed  to 
reveal  the  condition-specific  modulations  of  connectivity  (g(FDR)  <  .05)  (Benjamini  & 
Hochberg,  1995),  i.e.  those  paths  which  had  significantly  different  effective  connectivity 
between  human  and  machine  agents  while  receiving  bad  advice  (Appendix  A.4).  Since 
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GCA  is  a  data-driven  approach,  the  condition-specific  modulation  was  specifically 
chosen  for  analysis  based  upon  our  fMRl  results.  Effective  connectivity  of  brain  regions 
(i.e.,  nodes,  edges)  was  displayed  on  a  brain  surface  using  BrainNet  Viewer 
(www.nitrc.org/projects/bnv/),  a  graphical  interface  visualization  tool  (Xia,  Wang,  &  He, 
2013). 

2.4  Results 
Behavioral  Results 

First,  we  compared  the  performance  between  the  agent  groups  and  the  no  advice 
group  by  employing  a  one-way  ANOVA  with  Agent  (human,  machine,  no  agent)  as 
between-subjects  factors.  A  significant  main  effect  of  Agent  (F(2,  31)  =  13.85,/?  < 

.0001)  was  revealed,  and  post-hoc  testing  revealed  that  the  no  agent  group  performed 
better  than  the  human-agent  group  (t(20)  =  -4.06,  p  =  .001)  and  the  machine-agent  group 
(t(20)  =  -4.54,  p  <  .0001)  (Fig.  la).  Next,  we  looked  at  advice  utilization,  response 
times,  and  monetary  deductions  For  advice  utilization,  a  significant  main  effect  of  Advice 
was  revealed  (F(l,22)  =  7.63,/?  =  .011),  indicating  that  participants  accepted  good  advice 
more  than  bad  advice.  In  addition,  a  significant  three-way  interaction  of  Advice  x  Time  x 
Agent  was  identified  (F(l,  22)  =  5.06,  p  =  .035),  but  no  significant  main  effects  of  Agent 
(F(l,  22)  =  0.65, p  =  .429)  or  Time  (F(l,  22)  =  2.30, p  =  .144)  and  no  significant  two-way 
interaction  effects  of  Advice  x  Agent  (F(l,  22)  =  0.56,/?  =  .463),  Time  x  Agent  (F(l,  22) 
=  2.54, p  =  .125),  and  Advice  x  Time  (F(l,  22)  =  0.40,/?  =  .536)  (Fig.  lb)  were  found. 
Follow-up  2x2  ANOVAs  showed  a  significant  interaction  effect  of  Time  x  Agent  for 
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bad  advice  (F(l,  22)  =  5.63, p  =  .027),  but  not  for  good  advice  (F(l,  22)  =  1.23, p  = 

.279).  Follow-up  independent  samples  t-tests  revealed  that  the  human-agent  group 
accepted  bad  advice  less  than  the  machine-agent  group  during  run  2  (t(22)  =  -1.84,/?  = 
.040). 

For  response  times,  significant  main  effects  of  Advice  (F(l,  22)  =  12.26,/?  =  .002) 
and  Time  (F(l,  22)  =  5.85,/?  =  .024)  were  found,  indicating  that  responses  were  faster 
during  good  compared  to  bad  advice  and  during  run  2  compared  to  run  1  (Appendix 
A.5a).  A  marginally  significant  interaction  effect  was  found  for  the  interaction  of  Time  x 
Agent  (F(l,  22)  =  4.35,/?  =  .049),  but  no  significant  main  effect  of  Agent  (F(l,  22)  = 
0.49,/?  =  .491)  and  no  significant  interaction  effects  of  Advice  x  Agent  (F(l,  22)  =  0.10, 
p  =  .758),  Advice  x  Time  (F(l,  22)  =  0.07,/?  =  .798),  and  Advice  x  Time  x  Agent  (F(l, 
22)  =  0.06,/?  =  .811)  were  found. 

For  monetary  deductions,  a  significant  main  effect  of  Advice  (F(l,  22)  =  292.45, 
p  <  .0001)  was  revealed,  indicating  that  deductions  were  higher  during  bad  advice 
compared  to  good  advice  (Appendix  A.5b).  In  addition,  a  marginally  significant 
interaction  effect  of  Time  x  Agent  was  found  (F(l,  22)  =  4.61,/?  =  .043),  but  no 
significant  main  effects  of  Time  (F(l,  22)  =  0.31,/?  =  .583)  and  Agent  (F(l,  22)  =  1.56,/? 
=  .224),  or  interaction  effects  of  Advice  x  Agent  (F(l,  22)  =  0.10,/?  =  .758),  Advice  x 
Time  (F(l,  22)  =  0.10,/?  =  .921),  and  Advice  x  Time  x  Agent  (F(l,  22)  =  0.09,/?  =  .768) 
were  found. 
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Figure  1.  False  Alarm  Behavioral  Results 

Results  for  the  Decision  Phase  {M  ±  SEM).  a)  Task  Performance.  The  no  agent  group 
performed  better  than  human-  and  machine-agent  groups,  b)  Advice  Utilization.  Advice 
utilization  during  bad  advice  from  the  human  agent  was  significantly  lower  during  run  2 
compared  to  the  machine  agent. 
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In  addition,  we  looked  at  pre-  and  post-experiment  ratings  (reliability,  trust)  using 
repeated-measures  ANOVAs  with  Time  (run  1,  run  2)  and  Agent  (human,  machine)  as 
factors.  The  reliability  ratings  showed  no  significant  main  effect  of  Agent  (F(l,  22)  = 
0.62,  p  =  .439),  but  a  significant  main  effect  of  Time  (F(l,  22)  =  6.54,  p  =  .018)  and  a 
significant  interaction  effect  of  Time  x  Agent  (F(l,  22)  =  7.86,/?  =  .010)  (Fig.  2.2a). 
Post-hoc  testing  revealed  that  the  human  agent’s  pre-reliability  was  rated  higher  than  the 
machine’s  pre-reliability  (t(22)  =  2.87,/?  =  .009)  and  the  human’s  reliability  ratings 
decreased  from  pre-  to  post-experiment  (t(l  1)  =  4.\0,p  =  .002).  Furthermore,  one-sample 
t-tests  on  perceived  versus  actual  reliability  (60%)  of  the  agent  showed  that  pre-reliability 
ratings  were  significantly  higher  than  the  actual  reliability  for  the  human  agent  (t(l  1)  = 
6.79/1  <.0001). 

For  trust  ratings,  no  significant  main  effects  of  Agent  (F(l,  22)  =  0.26,  p  =  .615) 
and  Time  (F(l,  22)  =  3.96, p  =  .059)  were  observed,  but  a  significant  interaction  effect  of 
Time  x  Agent  (F(l,  22)  =  5.S9,p  =  .026)  was  demonstrated  (Fig.  2.2b).  Post-hoc  testing 
revealed  that  trust  ratings  significantly  decreased  from  pre-  to  post-experiment  for  the 
human  agent  (t(l  1)  =  4.lS,p  =  .002).  For  confidence  ratings,  no  main  effect  of  Agent 
(F(l,  22)  =  4.  \6,p  =  .054)  or  significant  interaction  effect  of  Target  x  Agent  (F(l,  22)  = 
2.46,  p  =  .131)  were  found,  but  a  significant  main  effect  of  Target  (F(l,  22)  =  53.44,  p< 
.0001)  was  revealed,  indicating  that  confidence  was  rated  higher  on  target  bags  compared 
to  non- target  bags.  (Appendix  A.6). 
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Figure  2.  False  Alarm  Rating  Results 

Results  For  Ratings  (M±  S£M).  a)  Pre-  And  Post-Reliability.  Pre-reliability  was 
higher  for  the  human  agent  compared  to  the  machine  agent.  For  the  human  agent, 
perceived  pre-reliability  was  significantly  higher  than  the  actually  reliability  of  the  agent 
(60%)  and  post-reliability  ratings  significantly  decreased,  b)  Pre-  And  Post-Trust. 
Post-trust  was  significantly  lower  than  pre-trust  for  the  human  agent. 
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Finally,  we  analyzed  at  differences  in  control  measures  (e.g.,  demographic 
measures  and  questionnaires)  with  bivariate  Spearman’s  p  correlations  and  independent 
samples  Mests.  For  the  human-agent  group,  a  positive  correlation  between  the  NTRS 
insecurity  score  and  pre-reliability  ratings  (r(12)  =  .73S,p  =  .006)  and  pre-trust  ratings 
(r(12)  =  .133,  p  =  .007)  were  found,  indicating  that  a  higher  insecurity  score  towards 
automation  (i.e.,  greater  preference  towards  human  interactions)  was  positively 
associated  with  higher  pre-reliability  and  pre-trust  ratings.  No  significant  group 
differences  were  identified  for  any  of  the  control  measures  (Appendix  A. 7). 

Neuroimaging  Results 

For  the  fMRI  results,  we  looked  at  brain  activations  during  the  decision  and  feedback 
phases  for  the  three-way  interaction.  For  the  decision  phase,  a  significant  three-way 
interaction  effect  (a  <  .05,  ^  =  21)  was  found  in  the  right  (R)  posterior  insula  (PI)  (BA 
13);  R  anterior  precuneus  (aPreC)  (BA  5/7),  left  (L)  aPreC  (BA  5/7);  L  posterior 
cingulate  cortex  (PCC)  (BA  30/31);  L  rostrolateral  prefrontal  cortex  (rlPFC)  (superior 
frontal  gyrus:  SFG;  BA  10);  and  L  posterior  temporoparietal  junction  (pTPJ)  (superior 
temporal  gyrus:  STG;  BA  22)  (Fig.  3,  Fig.  4,  Tab.  1).  The  results  indicate  that  there  was 
higher  activation  during  run  1  for  the  human-agent  group  compared  to  machine-agent 
group  during  bad  advice.  For  the  feedback  phase,  a  significant  three-way  interaction  {a  < 
.05,  A:  =  14)  was  found  in  the  L  dorsomedial  prefrontal  cortex  (dmPFC)  (medial  frontal 
gyrus:  MFG;  BA  9/10)  showing  higher  activation  for  the  human  agent  during  run  2  for 
good  compared  to  bad  advice  (Fig.  5,  Tab.  1).  Note  that  no  further  post-hoc  comparisons 
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were  performed  on  the  extraeted  data  from  the  deeision  or  feedback  phases  to  avoid  non- 
independent  analyses,  or  double  dipping  (Kriegeskorte,  Simmons,  Bellgowan,  &  Baker, 
2009). 


Figure  3.  False  Alarm  Brain  Activations  for  Decision  Phase 

(a  <  .05,  =  21).  The  three-way  interaction  (Advice  x  Run  x  Agent)  during  the  decision 

phase  significantly  activated  the  right  posterior  insula  (PI),  right  anterior  precuneus 
(aPreC),  left  aPreC,  left  posterior  cingulate  cortex  (PCC),  left  rostrolateral  prefrontal 
cortex  (rlPFC)  and  left  posterior  temporoparietal  junction  (pTPJ). 
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Figure  4.  False  Alarm  Activation  Patterns  During  Decision  Phase 

The  activation  pattern  indicates  higher  activation  for  the  human-  compared  to  machine- 
agent  group  for  bad  advice  during  run  1 .  The  bar  plots  shown  are  for  visualization 
purposes. 
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Figure  5.  False  Alarm  Brain  Activations  During  Feedback  Phase 
(a  <  .05,  k  =  14).  The  three-way  interaction  (Advice  x  Run  x  Agent)  during  the  feedback 
phase  significantly  activated  the  left  dorsomedial  prefrontal  cortex  (dmPFC).  The 
activation  pattern  shows  lower  activation  for  bad  advice  compared  to  good  advice  during 
run  2  for  the  human  agent.  The  bar  plot  serves  as  a  visual  aid  for  the  activation  pattern. 
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Table  1.  False  Alarm  Brain  Regions 

Brain  Regions  Associated  with  the  Three-Way  Interaction.  Brain  regions  showing 
significant  activation  clusters  associated  during  the  decision  (minimum  cluster  of  21)  and 
feedback  (minimum  cluster  of  14)  phases  (a  <  .05,  cluster-level  threshold  corrected).  PI, 
posterior  insula  (BA  13);  aPreC,  anterior  precuneus  (BA  5/7);  PCC,  posterior  cingulate 
cortex  (BA  30/31);  rlPFC,  rostrolateral  prefrontal  cortex  (BA  10);  pTPJ,  posterior 
temporoparietal  junction  BA  22);  dmPFC,  dorsomedial  prefrontal  cortex  (BA  9/10). 


F  (1,22) 

value 

Cluster  Size 
(mm^) 

X 

y 

z 

Decision  phase 

(Advice  x  Run  x  Agent) 

Right  posterior  insula 

32.86 

854 

36 

-15 

21 

Right  anterior  precuneus 

18.65 

593 

18 

-42 

45 

Left  anterior  precuneus 

21.52 

2214 

-6 

-42 

51 

Left  posterior  cingulate  cortex 

24.96 

607 

-3 

-63 

15 

Left  rostrolateral  prefrontal  cortex 

17.34 

692 

-21 

45 

21 

Left  posterior  temporoparietal 
junction 

23.58 

1678 

-48 

-45 

9 

Feedback  phase 

(Advice  x  Run  x  Agent) 

Left  dorsomedial  prefrontal  cortex 

25.03 

655 

-6 

51 

12 

Effective  Connectivity  Results 

Based  on  our  fMRI  results,  we  implemented  multivariate  GCA  to  identify  effective 
connectivity  among  brain  regions  during  the  decision  phase  when  comparing  the  human 
with  the  machine  agent  during  bad  advice  for  run  1  (all  connections  survived  ^(FDR)  < 
.05,  except  the  connections  to  the  L  rlPFC  that  survived  g(FDR)  <.  08)  (Tab.  2). 
Analysis  for  the  feedback  phase  was  not  done  due  to  the  fact  that  only  one  region 
survived  for  the  fMRI  results.  The  L  aPreC  and  PI  were  identified  as  the  source  ROIs; 
they  were  the  drivers  of  the  network  making  reciprocal  connections  to  each  other,  while 


32 


also  both  sending  output  connections  to  all  target  ROIs  (R  aPreC,  PCC,  rlPFC  and  pTPJ) 
(Fig.  6). 
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Figure  6.  False  Alarm  Results  for  Multivariate  Granger  Causality  Analysis 

The  effective  connectivity  network  for  bad  advice  during  the  decision  phase  for  run  1 
when  comparing  the  human  with  machine  agent  showed  that  the  PI  (posterior  insula)  and 
L  aPreC  (anterior  precuneus)  were  drivers  of  the  network  and  also  the  source  ROIs  for  all 
other  target  ROIs  (R  aPreC,  PCC  (posterior  cingulate  cortex),  rlPFC  (rostrolateral 
prefontal  cortex)  and  pTPJ  (posterior  temporoparietal  junction).  Note  that  all  connections 
survived  g(FDR)  <  .05,  except  the  connections  to  rlPFC  that  survived  ^(FDR)  <  .08.  The 
color  bar  represents  the  t-value  of  the  comparisons  shown  in  Table  2. 
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Table  2.  False  Alarm  Granger  Causality  Analysis 

Path  Weights  for  Granger  Causality  Analysis.  The  path  weights  displayed  show 
significant  effective  connectivity  paths  that  are  stronger  in  the  human-agent  group 
compared  to  the  machine-agent  group  during  run  1  (all  connections  survived  ^(FDR)  < 
.05,  except  the  connection  to  rlPFC  that  survived  g(FDR)  <  .08).  The  directionality  of 
the  connectivity  is  shown  in  the  first  two  columns,  with  the  source  column  showing  the 
ROIs  that  predict  activation  in  the  target  column  ROIs.  The  strength  of  connectivity  is 
given  by  the  mean  path  weights  in  the  third  column.  PI,  posterior  insula;  aPreC,  anterior 
precuneus;  PCC,  posterior  cingulate  cortex;  rlPFC,  rostrolateral  prefrontal  cortex;  pTPJ, 
posterior  temporoparietal  junction. 


Source 

Target 

Path  weight 

Human  Machine 

t  value 

p  value 

PI 

R  aPreC 

0.23 

0.18 

4.06 

2.80  X  10'^ 

L  aPreC 

0.18 

0.19 

2.57 

5.16  X  10'^ 

PCC 

0.27 

0.18 

3.96 

4.16  X  10'^ 

rlPFC 

0.16 

0.18 

2.32 

1.04  X  10'^ 

pTPJ 

0.17 

0.15 

2.52 

6.02  X  10'^ 

L  aPreC 

PI 

0.18 

-0.17 

2.42 

7.80  X  10'^ 

R  aPreC 

0.18 

-0.12 

2.44 

7.51  X  10'^ 

PCC 

0.20 

-0.15 

3.47 

2.79  X  10'^ 

rlPFC 

0.16 

-0.15 

2.01 

2.22  X  10'^ 

pTPJ 

0.24 

-0.21 

3.12 

9.39  X  10'^ 

2.5  Discussion 

The  purpose  of  this  research  was  to  understand  the  neural  basis  and  corresponding 
effective  connectivity  network  involved  during  advice  utilization  from  human  and 
machine  agents  framed  as  experts.  To  provide  a  greater  understanding  of  the  behavioral 
and  neural  underpinnings  associated  with  advice  taking,  we  manipulated  agent  reliability 
with  a  high  false  alarm  rate  to  reveal  the  decision-making  processes  during  good  and  bad 
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advice.  We  first  revealed  that  unreliable  advice  decreased  performanee,  which  has  been 
previously  reported  by  other  behavioral  studies  investigating  adviee  differenees  between 
humans  and  machines  (Dzindolet  et  al.,  2002;  Madhavan  &  Wiegmann,  2007a).  An 
earlier  study  investigating  credibility  found  that  advice  utilization  decreased  for  expert 
automation  but  not  for  expert  humans;  however,  this  study  focused  entirely  on  misses  and 
false  alarms,  whieh  could  account  for  any  differences  between  these  earlier  findings  and 
ours  (Madhavan  &  Wiegmann,  2007a).  In  addition,  a  study  investigating  pereeption 
during  a  contrast-deteetion  task  showed  that  false  alarms  evoked  more  eortical  activity 
when  compared  to  misses,  whieh  supports  the  notion  that  participants’  percepts  may  vary 
when  presented  with  different  types  of  errors  (Ress  &  Heeger,  2003).  In  our  study,  we 
focused  only  on  false  alarms  sinee  there  is  evidenee  of  distinct  neuronal  activity 
associated  with  false  alarms  when  compared  to  misses  and  behavioral  studies  have 
demonstrated  differenees  between  the  two  error  types  (Dixon  et  al.,  2007;  McBride, 
Rogers,  &  Fisk,  2014) 

Contradietory  to  our  hypothesis,  the  behavioral  results  revealed  that  the  decline  in 
adviee  utilization  was  greater  for  the  human  agent  compared  to  the  maehine  agent.  We 
expeeted  that  advice  utilization  would  degrade  faster  for  the  maehine  agent  because  of 
differenees  in  assoeiation  of  dispositional  credibility;  however,  our  results  indieate  that 
false  alarms  weighed  more  heavily  on  the  human-agent  group.  Our  findings  provide 
evidenee  that  although  assignment  of  personal  traits  may  have  been  higher  for  the  human 
agent,  the  prevalenee  of  false  alarms  may  have  altered  evaluations  of  performanee  levels 
due  to  the  type  of  error  presented.  Furthermore,  to  reveal  any  preconceived  notions  that 
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participants  had  about  the  human  and  machine  agents,  we  examined  whether  the 
perceived  pre-reliability  differed  from  the  actual  reliability  for  each  agent.  Interestingly, 
the  human  agent’s  pre-reliability  was  rated  significantly  higher  than  the  actual  reliability, 
showing  that  the  human-agent  group  expected  their  advisor  to  be  more  reliable.  Our 
finding  supports  other  behavioral  studies  that  indicate  that  preconceived  notions  can 
influence  participants’  perceptions  of  advice  (Madhavan  &  Wiegmann,  2007b).  Pre¬ 
reliability  and  pre-trust  ratings  for  the  human  agent  showed  a  positive  association  with 
insecurity  scores  for  embracing  new  technologies,  indicating  that  participants  interacting 
with  the  human  agent  had  initial  inclinations  that  tended  towards  human  interactions. 
These  findings  indicate  that  participants  interacting  with  the  human  agent  could  have 
perceivably  built  a  mental  model  of  their  expectations  about  the  agent’s  credibility  and 
deviations  from  expected  behavior  likely  caused  a  reevaluation  of  the  human  agent’s 
performance  (Burgoon,  1993).  The  change  in  perspectives  would  ultimately  cause  a  shift 
towards  self-reliance  and  possibly  increased  responsibility/accountability  for  the  outcome 
of  their  decisions  (Dzindolet  et  al.,  2002).  Post-reliability  ratings  for  the  human-agent 
group  showed  a  shift  towards  the  actual  reliability  of  the  agent,  which  indicates  that  the 
human-agent  group  was  able  to  discern  the  agent’s  performance  and  recalibrate  their 
expectations.  Moreover,  post-trust  was  lower  than  pre-trust  for  human  agent,  supporting 
previous  evidence  that  false  alarms  degrade  trust  (Dixon  et  al.,  2007;  Rice  &  McCarley, 
2011).  Lastly,  our  results  cannot  be  explained  by  any  of  our  control  measures  or 
confidence  ratings  because  we  found  no  differences  between  the  agent  groups. 
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Moreover,  our  results  revealed  that  advice  utilization  decreased  during  bad  advice 
compared  to  good  advice.  Since  bad  advice  was  advice-incongruent,  it  could  have 
created  a  mismatch  between  what  the  participants  perceived  and  what  they  were  advised, 
resulting  in  disconfirmation  experiences.  The  discrepancies  during  advice- 
disconfirmation  experiences  most  likely  lead  to  skepticism  during  bad  advice  and 
ultimately  degradation  of  advice  utilization.  As  a  consequence,  response  times  for  both 
groups  were  slower  during  bad  advice,  since  participants  had  more  conflicting  perceptual 
processes  (advice-incongruencies).  In  addition,  monetary  deductions  were  higher  overall 
for  bad  advice,  indicating  that  bad  advice  caused  participants  to  make  more  erroneous 
decisions. 

Subsequently,  we  identified  the  neural  basis  and  effective  connectivity  of  the 
underlying  brain  network  associated  with  advice  utilization.  On  the  neural  level,  we  had 
two  expectations  regarding  brain  activity.  First,  we  expected  activation  differences  in 
regions  associated  with  attribution  of  personal  traits  and  dispositions,  (Brosch  et  ah, 

2013;  Harris  et  ah,  2005),  and  secondly,  when  comparing  the  agent  groups  during  bad 
advice  over  time,  brain  regions  such  as  the  precuneus  and  posterior  cingulate  cortex 
would  be  the  drivers  of  the  advice  utilization  network.  Our  neuroimaging  results 
revealed  brain  regions  associated  with  domain-general  large-scale  networks,  such  as  the 
default-mode  network  (left  pTPJ,  bilateral  aPreC,  left  PCC)  typically  engaged  in  social 
evaluations,  the  salience  network  (PI)  for  detection  of  internal  and  external  salient  events, 
and  the  central-executive  network  (left  rlPFC)  implicated  in  higher-order  executive 
functions  (Menon,  2011).  Similarly  to  our  fMRI  hypotheses,  on  the  effective 
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connectivity  level,  we  theorized  that  a  network  to  be  differentially  involved  when 
comparing  the  human  to  the  machine  agent  for  bad  advice  during  run  1 .  Our  effective 
connectivity  analysis  revealed  that  left  aPreC  and  PI  were  drivers  of  the  network  that 
were  reciprocally  connected  to  each  other.  The  aPreC  and  PI  acted  as  centralized  hubs  of 
the  network,  presumably  by  integrating  social  evaluations  (e.g.,  judgments  about  other’s 
intentions  and  personal  traits)  (Cavanna  &  Trimble,  2006)  with  interoception  (e.g., 
recruitment  of  physiological  responses  to  environmental  cues)  (Kurth  et  al.,  2010). 
Previous  evidence  supports  the  notion  that  integration  of  subjective  mental  states  (PreC) 
and  information  about  internal  bodily  states  (anterior  insula,  AI)  are  important  for 
awareness  of  one’s  emotional  state  (Terasawa,  Fukushima,  &  Umeda,  2013).  Since 
participants  interacting  with  the  human  agent  could  have  had  greater  conceptualization  of 
the  discrepancies  between  the  actual  and  perceived  reliability,  this  could  have  led  to  a 
visceral  response  (PI)  to  the  unreliable  advice  in  conjunction  with  association  of  personal 
traits  (aPreC)  during  interactions  with  the  agent. 

Furthermore,  our  effective  connectivity  results  indicated  that  both  hubs  (left 
aPreC,  PI)  had  directional  influences  on  all  other  regions  (right  aPreC,  left  pTPJ,  PCC, 
and  left  rlPFC)  to  guide  decision-making  processes  during  advice  utilization.  PreC 
activation  has  been  identified  during  a  comparison  of  other-  versus  self-attribution, 
showing  the  involvement  of  this  region  during  causal  attributions  towards  another  (Farrer 
&  Frith,  2002).  In  addition,  PCC  activation  has  been  implicated  in  adapting  behaviors 
(Pearson,  Heilbronner,  Barack,  Hayden,  &  Platt,  2011)  and  self-reflection  (Johnson  et  al., 
2002),  while  the  pTPJ  has  been  shown  to  be  activated  during  social  cognitions  such  as 
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determining  intentionality  of  others  (Mars  et  ah,  2012).  Other  fMRI  studies  investigating 
expert  advice  have  shown  activation  in  PCC  and  PreC  during  no  advice  conditions 
(Engelmann  et  ah,  2009)  and  in  regions  such  as  PCC,  insula  and  medial  frontal  gyrus 
when  comparing  advice  vs.  no  advice  in  experts  and  peers  (Suen  et  ah,  2014);  however, 
we  did  not  expect  equivalent  results  since  our  experimental  design  looked  at  differences 
between  humans  and  machines.  Furthermore,  we  found  directional  influences  to  the 
rlPFC,  which  is  part  of  the  central-executive  network  and  has  shown  to  be  involved  in 
reasoning  (Christoff  et  ah,  2001)  and  while  making  uncertain  decisions  (Badre,  Doll, 
Long,  &  Frank,  2012). 

In  addition  to  our  results  for  the  decision  phase,  we  also  expected  participants  to 
have  a  heightened  awareness  of  bad  advice  due  to  feedback,  which  would  ultimately  lead 
to  a  behavioral  adjustment  in  advice  utilization  over  time.  During  the  feedback  phase,  we 
found  activation  in  the  dmPFC,  which  coincides  with  another  study  that  showed  dmPFC 
activity  during  feedback  after  iterative  trials  with  the  same  advisor  (Behrens,  Hunt, 
Woolrich,  &  Rushworth,  2008).  The  dmPFC  has  been  shown  to  be  involved  with  social 
cognition  (Amodio  &  Frith,  2006)  and  during  inferences  about  other’s  goals  and  traits 
(Krueger,  Grafman,  &  McCabe,  2008;  Van  Overwalle,  2009).  In  our  study,  participants 
interacting  with  the  human  agent  showed  lower  dmPFC  activation  during  bad  compared 
to  good  advice  toward  the  end  of  the  experiment,  which  shows  that,  as  participants 
ascertained  that  the  human  agent  was  unreliable,  they  could  have  placed  lower  value  on 
bad  advice  while  receiving  feedback. 
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Our  study  had  a  few  limitations  that  should  be  addressed.  First,  we  looked  at 
differenees  between  good  and  bad  advice  by  manipulating  agent  reliability  with  only 
false  alarms.  Future  studies  could  elaborate  on  our  findings  by  investigating  how  misses 
degrade  advice  utilization  between  humans  and  machines  and  the  effective  connectivity 
network  associated  with  those  differences.  Furthermore,  to  prevent  cognitive  anchoring, 
or  the  tendency  to  rely  too  heavily  on  the  first  piece  of  information  acquired,  we  had 
participants  receive  advice  before  they  made  their  decisions,  rather  than  receiving  advice 
after  they  made  their  decisions.  Cognitive  anchoring  has  been  shown  to  decrease  reliance 
on  automated  aids  during  self-generated  decisions  (Madhavan  &  Wiegmann,  2005)  and 
future  studies  could  investigate  this  phenomena  by  implementing  a  paradigm  where 
participants  receive  advice  after  they  make  their  decisions. 

In  summary,  our  findings  provide  extensive  insight  into  underlying  factors 
involved  with  advice  utilization  from  humans  and  machines  and  the  differences  that 
account  for  those  behaviors.  Our  results  have  significant  implications  for  society  because 
of  progressions  in  technology  and  increased  interactions  with  machines.  A  greater 
discernment  of  the  various  facets  involved  with  machine  interactions  will  ultimately 
serve  to  calibrate  behavioral  responses  and  to  optimize  future  safety  guidelines. 
Understanding  the  variables  and  environmental  differences  involved  during  advice  taking 
will  allow  for  substantive  information  to  improve  security  and  ultimately  prevent 
potential  catastrophic  disasters. 
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CHAPTER  THREE:  THE  IMPACT  OE  MISSES  ON  ADVICE  UTILIZATION 


3.1  Abstract 

Our  objective  was  to  reveal  the  underlying  neural  mechanisms  during  advice  utilization 
from  expert  human  and  machine  agents  with  fMRI  and  multivariate  Granger  causality 
analysis.  As  society  becomes  more  reliant  on  machines  and  automation,  understanding 
how  people  utilize  advice  is  a  necessary  endeavor.  The  impact  of  misses  on  decision¬ 
making  and  the  neural  basis  involved  with  advice  taking  needs  further  exploration. 

During  the  X-ray  luggage-screening  task,  participants  accepted  or  rejected  good  or  bad 
advice  from  either  the  human  or  machine  agent  framed  as  experts  with  manipulated 
reliability  (high  miss  rate).  We  showed  that  unreliable  advice  decreased  performance  and 
the  machine-agent  group  decreased  their  advice  utilization  compared  to  the  human-agent 
group.  The  differences  in  behaviors  during  advice  utilization  could  be  accounted  for  by 
high  expectations  of  reliable  advice  and  differences  in  attention  allocation  due  to  miss 
errors.  Areas  involved  with  the  salience  and  mentalizing  networks,  as  well  as  sensory 
processing  involved  with  attention,  were  recruited  during  the  task.  The  advice  utilization 
network  consisted  of  attentional  modulation  of  sensory  information  with  the  lingual  gyrus 
as  the  driver  during  the  decision  phase  and  the  fusiform  gyrus  as  the  driver  during  the 
feedback  phase.  Our  behavioral  and  fMRI  results  provide  evidence  demonstrating  that 
miss  errors  from  agents  framed  as  experts  decrease  advice  utilization  due  to  reevaluation 
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of  expectations.  Assessment  of  the  behavioral  and  neural  mechanisms  during  unreliable 
advice  can  expand  on  the  existing  literature  on  miss  errors,  while  also  providing  a  neural 
network  involved  with  advice  utilization  from  humans  and  machines. 
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3.2  Introduction 


People  are  often  given  numerous  options  regarding  the  type  and  source  of  advice  they  can 
receive.  For  example,  when  individuals  travel  to  a  new  country,  they  can  ask  a  native 
citizen  or  use  a  smartphone  with  a  Global  Positioning  System  (GPS)  for  directions. 

Given  the  different  options  available,  it  is  becoming  a  necessity  to  understand  how 
individuals  utilize  or  discount  advice  from  different  sources.  Factors  such  as  source 
credibility  (expert  and  novice)  (Madhavan  &  Wiegmann,  2007;  Van  Swol  &  Sniezek, 
2005)  and  initial  expectations  of  reliable  advice  (Dzindolet,  Pierce,  Beck,  &  Dawe,  2002) 
can  influence  how  someone  responds  to  advice.  Dzindolet  et  al.  (2002)  proposed  that 
individuals  may  possess  a  “perfect  automation  schema,”  which  is  an  expectation  that 
automation  performs  near  perfectly  and  can  ultimately  cause  a  person  to  disuse  the  advice 
given  to  them  when  errors  occur.  Initial  expectations  of  reliable  advice  can  be  impacted, 
however,  when  disconfirmation  evidence  of  misleading  advice  is  encountered. 

To  fully  understand  the  influence  of  bad  advice  on  decision-making  behaviors 
requires  an  examination  of  error  types:  false  alarms  and  misses.  The  type  of  error  is  of 
particular  interest  because,  while  a  false  alarm  error  is  misleading,  it  is  not  necessarily 
harmful.  In  contrast,  a  miss  error  can  lead  to  disastrous  results  such  as  a  luggage- 
screener  failing  to  detect  a  bomb  in  a  suitcase.  Previous  evidence  has  shown  that  false 
alarms  can  cause  a  “cry  wolf  effect,”  in  which  an  individual  may  tend  to  ignore  true  alerts 
(Breznitz,  2013)  and  misses  may  affect  monitoring  strategies  leading  to  an  adaptation  of 
attention  allocation  (Onnasch,  Ruff,  &  Manzey,  2014).  False  alarms  have  been  shown  to 
decrease  trust  and  decrease  reliance  and  compliance,  while  misses  have  been  shown  to 
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only  decrease  reliance  (Dixon,  Wickens,  &  McCarley,  2007;  Rice  &  McCarley,  2011). 
Furthermore,  studies  comparing  humans  and  machines  have  shown  that  expert  humans 
were  trusted  more  than  expert  machines  due  to  differences  in  dispositional  credibility 
(Madhavan  &  Wiegmann,  2007)  and  allocation  of  tasks  to  humans  compared  to 
automation  can  be  affected  by  trust  in  automation  (Lewandowsky,  Mundy,  &  Tan,  2000). 
To  expand  on  the  existing  literature  on  humans  and  machines,  we  previously  investigated 
the  impact  of  false  alarms  on  decision-making  behaviors  (Goodyear  et  ah,  2015, 
submitted),  and  to  elaborate  on  those  findings,  the  current  study  examined  misses. 

The  neural  processes  involved  with  advice  taking  have  been  recently  investigated 
with  functional  magnetic  resonance  imaging  (fMRl)  advice-taking  paradigms  examining 
expert  advice  (Boorman,  O'Doherty,  Adolphs,  &  Rangel,  2013;  Meshi,  Biele,  Kom,  & 
Heekeren,  2012)  and  during  adaptive  learning  (Biele,  Rieskamp,  Krugel,  &  Heekeren, 
2011).  Furthermore,  neuroimaging  studies  examining  interactions  between  humans  and 
robots  during  perspective  taking  (Krach  et  ah,  2008)  and  during  social  observations 
(Wang  &  Quadflieg,  2015)  have  also  been  investigated.  The  default-mode  network  (e.g., 
temporoparietal  junction,  precuneus)  and  the  salience  network  (dorsal  anterior  cingulate 
cortex,  insulae)  have  been  additionally  implicated  in  other  advice-taking  tasks 
(Engelmann,  Capra,  Noussair,  &  Bems,  2009),  as  well  as  during  robot-human  interaction 
paradigms  (Chaminade  et  ah,  2012).  However,  in  spite  of  the  existing  literature  on 
advice  taking,  the  neural  basis  and  underlying  brain  networks  associated  with  miss  errors 
from  expert  human  and  machines  remains  to  be  elucidated. 
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We  implemented  an  X-ray  luggage-screening  task  with  fMRI  combined  with 
multivariate  Granger  causality  analysis  (GCA)  to  investigate  the  impact  of  misses  on 
decision-making  behaviors  and  to  reveal  the  underlying  brain  network  associated  with 
advice  utilization  from  unreliable  agents  framed  as  experts.  Based  upon  previous  studies 
investigating  misses  and  false  alarms  (Dzindolet  et  ah,  2002;  McBride,  Rogers,  &  Fisk, 
2014),  we  first  hypothesized  that  unreliable  advice  would  decrease  performance  (i.e., 
accuracy)  compared  to  no  advice.  Furthermore,  we  expected  advice  utilization  to 
decrease  due  to  the  significance  of  a  miss  error  and  due  to  disconfirmation  evidence 
about  the  agents’  expertise  provided  by  feedback.  We  expected  the  reevaluation  of  the 
agents’  perceived  credibility  to  cause  a  mismatch  of  perceptions  due  to  advice- 
incongruencies,  which  would  ultimately  cause  an  adjustment  in  attention  allocation 
strategies.  In  addition,  based  upon  previous  work  investigating  advice  acceptance  and 
trust  between  expert  human  and  machine  agents  (Madhavan  &  Wiegmann,  2007),  we 
expected  participants  interacting  with  the  machine  agent  to  have  a  greater  depreciation  of 
advice  utilization  compared  to  the  human  agent  due  to  perceptions  involved  with  the 
perfect  automation  schema  and  varying  degrees  of  perceived  dispositional  credibility. 
Brain  regions  involved  with  self-processing  (e.g.,  precuneus)  and  error  monitoring  and 
salience  detection  (e.g.,  anterior  cingulate  cortex)  would  be  recruited  when  comparing  the 
human  agent  and  the  machine  agent  due  to  deviations  in  expectations  (agents  framed  as 
experts),  resulting  from  a  change  in  attention  strategies  from  a  high  miss  rate. 
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3.3  Methods 


Subjects 

A  normative  rating  study  and  behavioral  study  were  conducted  at  George  Mason 
University  (GMU)  and  an  fMRI  study  was  conducted  at  Auburn  University  (AU).  All 
studies  were  conducted  according  to  the  ethical  guidelines  and  principles  of  the 
Declaration  of  Helsinki.  For  the  normative  rating  study,  twenty-three  male  students  (age 
(M±  SD)  =  24.0  ±  2.6)  participated  to  standardize  the  X-ray  luggage  images  for  the 
experimental  studies.  For  the  behavioral  study,  twelve  volunteers  (7  males,  5  females; 
age  =  20.9  ±  3.4)  participated  to  complete  an  X-ray  luggage-screening  task  without 
receiving  advice.  For  the  fMRI  study,  twenty- four  healthy  right-handed  volunteers  (14 
males,  10  females;  age  =  22.3  ±  2.4)  participated  in  the  X-ray  luggage-screening  task 
while  receiving  advice.  Participants  gave  written  consent  approved  by  the  Institutional 
Review  Boards  at  GMU  and  AU  and  they  received  financial  compensation  for  their 
participation  (see  Goodyear  et.  al,  2015,  submitted,  for  details  on  methods). 

X-ray  Luggage-Screening  Task 

Participants  partook  in  an  X-ray  luggage-screening  task  and  were  asked  to  search  for  the 
presence  or  absence  of  a  knife  (Madhavan  &  Gonzalez,  2006)  (Appendix  B.la).  In  the 
behavioral  study,  participants  performed  the  task  unassisted  without  receiving  advice  (no 
agent  group).  In  the  fMRI  study,  participants  were  assigned  to  either  the  human-agent 
group  or  the  machine-agent  group  with  60%  reliability  and  they  received  good  (advice- 
congruent)  and  bad  (advice-incongruent)  advice  (Appendix  B.lb). 
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The  jitter  times  were  generated  by  an  fMRI  simulator  software 
(http://www.mccauslandcenter.sc.edu/CRNL/tools/fmrisim)  and  consisted  of  a  minimum 
of  one  second  and  an  average  of  four  seconds  to  optimize  timing.  Participants  responded 
by  using  fiber  optic  response  pads  (Current  Designs,  http://www.curdes.com/);  they  were 
given  an  initial  endowment  of  $40  and  each  incorrect  answer  resulted  in  a  deduction  of 
$0.30  from  the  remaining  total.  Performance,  advice  utilization,  response  times  and 
monetary  deductions  were  collected  during  the  experiment.  The  stimuli  were  presented 
using  E-Prime  2.0  (Psychology  Software  Tools,  Inc.). 

Procedure 

Pre-Experimental  Phase.  Participants  completed  self-report  questionnaires  as  control 
measures  to  investigate  individual  differences  approximately  one  to  two  weeks  before  the 
fMRI  experiment.  The  control  measures  included:  Interpersonal  Reactivity  Index  (IRI) 
(Davis,  1983),  Complacency-Potential  Rating  Scale  (CPS)  (Singh,  Molloy,  & 
Parasuraman,  1997),  National  Readiness  Technology  Scale  (NTRS)  (Parasuraman,  2000), 
NEO  Five-Factor  Inventory  (NEO-FFI)  (Costa  &  McCrae,  1992),  and  Propensity  to  Trust 
(PTT)  (Merritt,  Heimbaugh,  LaChapell,  &  Lee,  2013). 

Experimental  Phase.  Participants  completed  a  practice  run  where  they  read  descriptions 
about  the  human  or  machine  agent  (reliability  was  not  disclosed),  rated  their  trust  in  and 
reliability  of  the  human  or  machine  agent  on  a  10-point  Likert  scale  (0  =  very  low,  10  = 
very  high),  familiarized  themselves  with  the  five  possible  knives  that  could  be  present  in 
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the  bags  and  then  completed  four  practice  trials  of  the  task.  The  participants  then 
completed  two  experimental  runs  of  the  task  while  in  the  scanner  and  rated  reliability  and 
trust  afterwards. 

Post-Experimental  Session.  After  completion  of  the  fMRl  experiment,  participants  were 
asked  to  rate  their  confidence  in  finding  the  target  (i.e.,  knife)  in  each  of  the  images 
presented  during  the  experiment  on  a  10-point  Likert  scale  (1  =  very  low,  10  =  very 
high). 

Neuroimaging  Acquisition 

Imaging  data  were  acquired  on  a  7T  actively  shielded  whole-body  scanner  (Siemens 
Magnetom)  with  a  32-channel  head  coil  (Nova  Medical)  at  AU  MRI  Research  Center, 
Auburn,  Alabama.  The  anatomical  imaging  data  were  based  on  a  3D  T1 -weighted 
MPRAGE  sequence  with  TR  =  2020  ms,  TE  =  2.7  ms,  flip  angle  =  7°,  slice  thickness  = 

1 .2  mm,  voxel  dimension  =1.1  mm  x  1 . 1  mm  x  1 .2  mm  and  number  of  slices  =  240.  The 
functional  imaging  data  were  based  on  a  2D  gradient-echo  multiband  EPl  sequence  with 
TR  =  1000  ms,  TE  =  20  ms,  flip  angle  =  70°,  slice  thickness  =  2  mm,  voxel  dimensions  = 
2.1  mm  X  2.1  mm  x  2.0  mm,  number  of  slices  =  45  per  volume  in  an  axial  orientation 
parallel  to  the  anterior-posterior  commissure  and  a  multiband  factor  of  2.  The  first  two 
volumes  were  discarded  to  allow  for  T1  equilibrium  effects  and  a  total  of  660  volumes 
were  taken  for  each  run. 
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Behavioral  Data  Analysis 

Behavioral  data  was  analyzed  with  the  Statistieal  Package  for  the  Social  Sciences  20.0 
(SPSS  20.0,  IBM  Corp.)  and  the  alpha  was  set  top<  .05  (two-tailed).  Data  were 
normally  distributed  (Kolmogorov-Smimov  test)  and  assumptions  for  analyses  of 
variance  (Bartlett’s  test)  were  not  violated.  To  investigate  task  performance  between  the 
agents  and  the  no  agent  group,  a  one-way  analysis  of  variance  (ANOVA)  with  Agent 
(human,  machine,  no  agent)  as  the  between- subjects  factor.  Mixed  2x2x2  repeated- 
measures  ANOVAs  with  Advice  (good,  bad)  and  Time  (run  1,  run  2)  as  within-subjects 
factors  and  Agent  (human,  machine)  as  the  between-subjects  factor  were  employed  to 
examine  advice  utilization,  response  times  and  monetary  deductions.  In  addition,  we 
investigated  reliability,  trust  and  confidence  ratings  with  mixed  2x2  repeated-measures 
ANOVAs  with  Agent  (human,  machine)  as  the  between-subjects  factor.  The  within- 
subjects  factor  for  the  reliability/trust  ratings  were  Time  (pre,  post)  and  for  confidence 
ratings  was  Target  (yes,  no). 

Neuroimaging  Data  Analysis 

The  fMRI  data  was  analyzed  through  NeuroElf  software  (http://neuroelf.net)  and 
BrainVoyager  QX  2.8  (Brain  Innovation).  The  functional  imaging  data  were 
preprocessed  using  Statistical  Parametric  Mapping  (SPM,  Wellcome  Department  of 
Cognitive  Neurology)  functions  batched  via  NeuroElf,  including  three-dimensional 
motion  correction  (six  parameters),  slice-scan  time  correction  (temporal  interpolation).  A 
mean  functional  image  was  computed  for  each  participant  across  all  runs  and  was  then 
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co-registered  with  the  anatomical  images  using  a  joint-histogram  for  the  different  contrast 
types.  Preprocessing  procedures  for  the  anatomical  images  included  segmenting  images 
with  a  unified  segmentation  procedure  (Ashbumer  &  Friston,  2005)  and  the  functional 
images  had  spatial  warping  applied  to  them  to  normalize  the  data  to  a  standard  Montreal 
Neurological  Institute  (MNI)  brain  template.  To  account  for  any  residual  differences 
across  participants,  spatial  smoothing  (Gaussian  filter  of  6  mm  FWHM)  was  applied  to 
the  images. 

A  general  linear  model  (GLM)  that  was  corrected  for  first-order  serial  correlations 
fit  to  the  data  (Friston,  Harrison,  &  Penny,  2003),  which  consisted  of  thirty-seven 
regressors  based  on  advice  utilization  (accept,  reject),  advice  type  (good,  bad),  time  (run 
1,  run  2)  for  each  of  the  five  phases  (fixation,  advice,  bag,  decision,  feedback)  and  seven 
parametric  regressors  of  no  interest  for  the  global  signal  and  3D  motion  correction 
(translations  in  X,  Y,  Z  directions,  rotations  around  X,  Y,  Z  axes).  The  regressor  time 
courses  were  adjusted  for  the  hemodynamic  response  delay  by  convolution  with  a  dual¬ 
gamma  canonical  hemodynamic  response  function  (Buchel,  Holmes,  Rees,  &  Friston, 
1998).  Random-effect  analyses  were  performed  at  the  multi-subject  level  to  explore 
brain  activations  associated  with  the  decision  and  feedback  phases  during  advice 
utilization. 

Mixed  2x2x2  ANOVAs  on  parameter  estimates  were  applied  with  Advice 
(good,  bad)  and  Time  (run  1,  run  2)  as  within-subjects  factors  and  Agent  (human, 
machine)  as  the  between-subjects  factor.  Brain  activations  for  the  decision  and  feedback 
phases  were  reported  after  correcting  for  multiple  comparisons  using  a  cluster-level 
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statistical  threshold  (Cluster-level  Statistical  Threshold  Estimator  plugin  in  BrainVoyager 
QX).  The  thresholded  map  {p  <  .005)  was  used  for  a  whole-brain  correction  criterion, 
which  is  based  off  an  estimate  of  the  map’s  spatial  smoothness  and  on  a  Monte  Carlo 
simulation  (1,000  iterations).  The  minimum  cluster  size  at  a  specified  confidence  level  (a 
=  0.05)  was  then  calculated  (Forman  et  ah,  1995;  Goebel,  Esposito,  &  Formisano,  2006). 
The  significant  activation  clusters  were  displayed  in  MNl  space  on  an  anatomical  brain 
template  reversed  left  to  right  (i.e.,  radiological  convention). 

Effective  Connectivity  Analysis 

Effective  (or  directional)  connectivity  data  were  analyzed  using  a  code  developed  in- 
house  using  MATLAB  (www.mathworks.com)  (Grant  et  ah,  2014;  Lacey,  Stilla, 
Sreenivasan,  Deshpande,  &  Sathian,  2014)  (for  more  details  on  methods  see  Appendix 
B.2).  The  effective  connectivity  in  the  network  of  activated  regions  was  performed 
through  multivariate  Granger  causality  analysis  (GCA)  and  only  regions  that  survived  the 
fMRI  analysis  threshold  for  the  main  effect  of  Agent  (human,  machine)  for  the  decision 
and  feedback  phases  were  selected  as  ROls.  Time  series  of  the  blood-oxygen-level- 
dependent  (BOLD)  signal  for  the  selected  ROls  were  extracted  around  peak  activation 
maxima  (sphere  of  6  x  6  x  6  mm  ),  averaged  across  voxels  and  normalized  across 
participants,  per  run.  Blind  hemodynamic  deconvolution  of  the  mean  ROI  BOLD  time 
series  was  performed  using  a  Cubature  Kalman  filter  and  smoother  (Havlicek,  Friston, 
Jan,  Brazdil,  &  Calhoun,  2011)  and  the  resulting  latent  neural  signals  were  entered  into  a 
first  order  dynamic  multivariate  autoregressive  (dMVAR)  model  to  assess  directed 
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interactions  of  multiple  nodes  as  a  function  of  time  (Feng  et  al.,  2015;  Grant,  Wood, 
Sreenivasan,  Wheelock,  &  White,  2015;  Hampstead,  Khoshnoodi,  Yan,  Deshpande,  & 
Sathian,  2016;  Hutcheson  et  al.,  2015;  Wheelock  et  al.,  2014). 

Granger  cormectivity  path  weights  for  the  condition  of  interest  (advice  utilization) 
for  each  agent  (human,  machine)  were  extracted,  populated  into  two  samples,  and 
independent  samples  Mests  were  employed  (^(FDR)  <  .05)  (Benjamini  &  Hochberg, 
1995)  to  reveal  significantly  different  effective  cormectivity  paths  between  the  agent 
groups  (Appendix  B.3).  Effective  connectivity  of  brain  regions  (i.e.,  nodes,  edges)  was 
displayed  on  a  brain  surface  using  BrainNet  Viewer,  a  graphical  interface  visualization 
tool  (Xia,  Wang,  &  He,  2013). 

3.4  Results 

Behavioral  Results 

The  one-way  ANOVA  comparing  performance  between  the  agent  groups  and  the  no 
agent  group  revealed  a  significant  main  effect  of  Agent  (F(2,  33)  =  5.77,  p  =  .007). 
Planned  follow-up  analysis  revealed  that  the  no  agent  group  performed  better  than  the 
human-agent  group  (t(22)  =  -3.31,  p  =  .003)  and  the  machine-agent  group  (t(22)  =  -2.24, 
p  =  .035)  (Fig.  7a). 
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Figure  7.  Miss  Behavioral  Results 

Results  for  the  Decision  Phase  {M  ±  SEM).  a)  Task  Performance.  The  no  agent  group 
performed  better  than  human-  and  machine-agent  groups,  b)  Advice  Utilization.  Advice 
utilization  was  significantly  lower  for  bad  advice  compared  to  good  advice  and  was  also 
significantly  lower  for  the  machine-agent  group  compared  to  the  human-agent  group. 
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Next,  we  looked  at  adviee  utilization  by  implementing  mixed  ANOVAs.  For 
advice  utilization,  significant  main  effects  of  Agent  (F(l,  22)  =  524,  p  =  .032),  Advice 
(F(l,22)  =  140.72,/?  <  .0001)  and  Time  (F(l,22)  =  2236, p  <  .0001)  were  found.  These 
results  indicate  that  participants  accepted  advice  more  from  the  human  agent  compared  to 
the  machine  agent.  Furthermore,  good  advice  was  accepted  more  than  bad  advice  and 
advice  utilization  decreased  over  time  (Fig.  7b).  In  addition,  a  significant  two-way 
interaction  of  Advice  x  Time  was  identified  (F(l,  22)  =  10.17,/?  =  .004),  but  no 
significant  two-way  interaction  effects  of  Advice  x  Agent  (F(l,  22)  =  0.69,/?  =  .415), 
Time  x  Agent  (F(l,  22)  =  0.46,/?  =  .505),  or  three-way  interaction  of  Advice  x  Time  x 
Agent  (F(l,  22)  =  1.40,/?  =  .249)  were  found. 

In  addition,  we  looked  at  pre-  and  post-reliability/trust  ratings.  One  participant’s 
data  were  not  used  due  to  lack  of  understanding,  which  was  indicated  by  the  high  values 
for  all  pre/post  scales.  The  reliability  ratings  showed  no  significant  main  effect  of  Agent 
(F(l,  21)  =  6.16,  p  =  .394),  but  a  significant  main  effect  of  Time  (F(l,  21)  =  5.43,/?  = 
.030),  showing  that  reliability  ratings  decreased  from  pre-  to  post-experiment  (Fig.  8a). 
No  significant  interaction  effect  of  Time  x  Agent  (F(l,  21)  =  0.00,/?  =  .960)  was  found. 
Furthermore,  one-sample  Mests  on  perceived  versus  actual  reliability  (60%)  of  the  agent 
revealed  that  pre-reliability  ratings  were  significantly  higher  than  the  actual  reliability  for 
the  human  agent  (t(l  1)  =  4.53  p  =  .001)  and  the  machine  agent  (t(10)  =  3.55  p  =  .005). 
For  trust  ratings,  no  significant  main  effect  of  Agent  (F(l,  21)  =  0.01,/?  =  .905)  was 
found,  but  a  significant  main  effect  of  Time  (F(l,  21)  =  8.18,/?  =  .009)  was  observed, 
showing  that  trust  ratings  significantly  decreased  from  pre-  to  post-experiment  (Fig.  8b). 
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No  significant  interaction  effect  of  Time  x  Agent  (F(l,  21)  =  0.00,/)  =  .960)  was 
demonstrated. 

We  next  analyzed  differences  in  control  measures  (e.g.,  demographic  measures 
and  questionnaires)  with  independent  samples  Mests.  No  significant  group  differences 
were  identified  for  any  of  the  eontrol  measures  (Appendix  B.4). 

For  response  times,  a  signifieant  main  effeet  of  Time  (F(l,  22)  =  5.42,/)  =  .030) 
was  found,  indicating  that  responses  were  faster  during  run  2  compared  to  run  1 
(Appendix  B.5a).  No  signifieant  main  effects  of  Agent  (F(l,  22)  =  0.77,/)  =  .389)  or 
Adviee  (F(l,  22)  =  1.34,/)  =  .260)  were  revealed  and  no  signifieant  interaetion  effeets  of 
Adviee  x  Agent  (F(l,  22)  =  3.27,/)  =  .084),  Time  x  Agent  (F(l,  22)  =  3.28,/)  =  .084), 
Adviee  x  Time  (F(l,  22)  =  2.46,/)  =  .131)  or  Advice  x  Time  x  Agent  (F(l,  22)  =  0.73,/) 
=  .401)  were  found. 

For  monetary  deductions,  a  significant  main  effect  of  Time  (F(l,  22)  =  7.13,/)  = 
.014)  was  revealed,  indieating  that  deduetions  were  higher  during  run  1  compared  to  run 
2  (Appendix  B.5b).  No  significant  main  effects  of  Advice  (F(l,  22)  =  1.34,/)  =  .260) 
and  Agent  (F(l,  22)  =  0.69,/)  =  .414),  or  interaction  effects  of  Advice  x  Agent  (F(l,  22) 
=  3.54,/)  =  .073),  Advice  x  Time  (F(l,  22)  =  0.08,/)  =  .776),  Time  x  Agent  (F(l,  22)  = 
0.66,/)  =  .427),  or  Adviee  x  Time  x  Agent  (F(l,  22)  =  2.50,/)  =  .128)  were  found. 
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Figure  8.  Miss  Rating  Results 

Results  for  Ratings  {M±  SEM).  a)  Pre-  and  Post-Reliability.  For  both  groups,  the 
perceived  pre-reliability  was  significantly  higher  than  the  actually  reliability  of  the  agent 
(60%)  and  post-reliability  ratings  significantly  decreased,  b)  Pre-  and  Post-Trust.  Post¬ 
trust  was  significantly  lower  than  pre-trust  for  both  groups. 
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For  confidence  ratings,  no  main  effect  of  Agent  (^(1,  22)  =  039,  p  =  .538)  or 
significant  interaction  effect  of  Target  x  Agent  (F(l,  22)  =  0.50,/?  =  .488)  was  found,  but 
a  significant  main  effect  of  Target  (F(l,  22)  =  46.30,/?  <  .0001)  was  revealed,  indicating 
that  confidence  was  rated  higher  on  target  bags  compared  to  non-target  bags  (Appendix 
B.6). 

Neuroimaging  Results 

We  investigated  brain  activations  during  the  decision  and  feedback  phases  with  mixed 
ANOVAs.  For  the  decision  phase,  a  significant  main  effect  of  Agent  {a  <  .05,  A:  =  1 1) 
was  found  in  the  right  (R)  lingual  gyrus  (LG)  (BA  1 8),  R  anterior  cingulate  cortex  (ACC) 
(BA  24),  left  (L)  anterior  precuneus  (aPreC)  (superior  parietal  lobule;  BA  7),  and  L 
cuneus  (CUN)  (BA  18)  (Fig.  9,  Fig.  10,  Tab.  3).  A  main  effect  of  Advice  (a  <  .05,  k  = 
11)  was  found  in  the  R  middle  frontal  gyrus  (BA  8),  R  medial  frontal  gyrus  (BA  8),  R 
rostrolateral  prefrontal  cortex  (rlPFC)  (superior  frontal  gyrus;  BA  10),  R  primary  visual 
cortex  (VI)  (BA  17),  R  pre- supplementary  motor  area  (pre-SMA)  (superior  frontal  gyrus; 
BA  6),  L  cerebellar  culmen,  L  inferior  occipital  gyrus  (lOG)  (BA  18). 
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Figure  9.  Miss  Brain  Activations  During  Decision  Phase 
(a  <  .05,  =  11).  The  main  effect  of  Agent  during  the  decision  phase  significantly 

activated  the  right  lingual  gyrus  (LG),  right  anterior  cingulate  cortex  (ACC),  left  anterior 
precuneus  (aPreC)  and  left  cuneus  (CUN). 
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Figure  10.  Miss  Activation  Patterns  During  Decision  Phase 

The  activation  pattern  indicates  higher  activation  for  the  human-  compared  to  machine- 
agent  group  for  all  regions  except  the  ACC.  The  bar  plots  shown  are  for  visualization 
purposes.  To  avoid  circularity,  or  double  dipping,  no  further  statistical  analyses  were 
performed  for  the  decision  and  feedback  phases  (Kriegeskorte,  Simmons,  Bellgowan,  & 
Baker,  2009). 


Table  3.  Miss  Brain  Regions 

Brain  Regions  Associated  with  the  Agent  and  Advice  Main  Effects.  Brain  regions 
showing  significant  activation  clusters  associated  during  the  decision  phase:  Agent 
(minimum  cluster  of  1 1)  and  Advice  (minimum  cluster  of  1 1);  and  feedback  phase: 

Agent  (minimum  cluster  of  10)  and  Advice  (minimum  cluster  of  9)  (a  <  .05,  cluster-level 
threshold  corrected,  MNl  space). 


Decision  Phase 
Agent 

Right  lingual  gyrus 

Right  anterior  cingulate  cortex 


F  value 


Cluster  Size 
(mm^) 


y 


z 


18.44 

24.35 


629  27  -78  -6 

1246  6  27  12 
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Left  anterior  precuneus 

19.84 

727 

-9 

-63 

57 

Left  cuneus 

19.95 

758 

-21 

-84 

15 

Advice 

Right  middle  frontal  gyrus 

24.75 

822 

42 

18 

42 

Right  medial  frontal  gyrus 

21.3 

3182 

21 

27 

33 

Right  rostrolateral  prefrontal  cortex 

28.51 

560 

24 

54 

6 

Right  primary  visual  cortex 

19.72 

1722 

15 

-96 

-3 

Right  pre-supplementary  motor  area 

19.86 

665 

6 

9 

56 

Left  cerebellar  culmen 

17.93 

601 

-12 

-36 

-24 

Left  inferior  occipital  gyrus 

16.37 

1936 

-24 

-90 

-6 

Feedback  Phase 


Agent 


Right  precentral  gyrus 

16.66 

456 

51 

-6 

6 

Right  inferior  parietal  lobule 

15 

398 

48 

-26 

24 

Right  cuneus 

15.37 

422 

24 

-84 

15 

Left  putamen 

19.3 

1445 

-27 

-15 

6 

Left  fusiform  gyrus 

Advice 

19.58 

990 

-42 

-47 

-21 

Right  postcentral  gyrus 

19.33 

960 

42 

-18 

27 

Right  middle  frontal  gyrus 

16.78 

631 

33 

21 

39 

Right  hippocampus 

18.19 

1347 

29 

-39 

3 

Right  extra-nuclear 

18.66 

468 

24 

21 

15 

Right  orbitofronal  cortex 

25.94 

892 

21 

45 

-3 

Right  posterior  cingulate  cortex 

31.47 

1049 

12 

-63 

23 

Right  anterior  precuneus 

23.25 

1865 

6 

-69 

47 

Left  cerebellar  culmen 

23.43 

2945 

-6 

-42 

-21 

Left  pons 

16.91 

373 

3 

21 

51 

Left  pre-supplementary  motor  area 

18.27 

644 

-18 

-24 

-30 

Left  parahippocampal  gyrus 

29.02 

1102 

-24 

-42 

0 

Left  postcentral  gyrus 

31.04 

1300 

-42 

-21 

27 

For  the  feedback  phase,  a  main  effect  of  Agent  (a  <  .05,  k=  10)  was  found  in  the 
R  precentral  gyrus  (PrG)  (BA  6),  R  inferior  parietal  lobule  (IPL)  (BA  40),  R  CUN  (BA 


17),  L  putamen  (Pu)  and  L  fusiform  gyrus  (FG)  (BA  37)  (Fig.  11,  Fig.  12,  Tab.  3). 
Lastly,  a  significant  main  effect  of  Advice  (a  <  .05,  k=9)  during  the  feedback  phase  was 
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found  in  the  R  postcentral  gyrus  (PoG)  (BA  3),  R  middle  frontal  gyrus  (BA  8),  R 
hippocampus,  R  extra-nuclear,  R  orbitofrontal  cortex  (OFC)  (BA  10/1 1),  R  posterior 
cingulate  cortex  (PCC)  (BA  31),  R  aPreC  (BA  7),  L  cerebellar  culmen,  L  pre-SMA  (BA 
6/8),  L  pons,  L  parahippocampal  gyrus  (BA  19)  and  L  PoG  (BA  2). 


Figure  11.  Miss  Brain  Activations  During  Feedback  Phase 
(a  <  .05,  k  =  10).  The  main  effect  of  Agent  during  the  feedback  phase  significantly 
activated  the  right  precentral  gyrus  (PrG),  right  inferior  parietal  lobule  (IPL),  R  cuneus 
(CUN),  left  putamen  (Pu)  and  left  fusiform  gyrus  (FG). 
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Figure  12.  Miss  Brain  Activation  Patterns  During  Feedback  Phase 

The  activation  pattern  shows  higher  activation  for  the  machine-agent  group  compared  to 
the  human-agent  group  for  all  regions  except  for  FG  and  CUN.  The  bar  plots  shown  are 
for  visualization  purposes. 
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Effective  Connectivity  Results 

To  identify  effective  connectivity  among  brain  regions  when  comparing  the  human  to  the 
machine  agents  during  the  decision  and  feedback  phases,  we  implemented  multivariate 
GCA  based  upon  our  results  from  the  fMRI  analysis  (^(FDR)  <  .05).  The  LG  was 
identified  as  the  source  ROl  for  the  advice  utilization  network  for  the  decision  phase,  that 
sent  output  connections  to  all  target  ROls  (ACC,  aPreC,  CUN)  and  the  FG  was  the  source 
ROI  for  the  feedback  phase  sending  an  output  connection  to  the  IPL  (Fig.  13,  Tab.  4). 


Figure  13.  Miss  Results  for  Multivariate  Grauger  Causality  Aualysis 

The  effective  connectivity  network  for  advice  utilization  during  the  decision  phase  when 
comparing  the  human  with  machine  agent  showed  that  the  LG  (lingual  gyrus)  was  the 
driver  of  the  network  and  source  ROI,  sending  outputs  to  all  target  ROIs  (ACC  (anterior 
cingulate  cortex),  aPreC  (anterior  precuneus),  and  CUN  (cuneus))  (all  connections 
survived  g(FDR)  <  .05).  The  color  bar  represents  the  t-value  of  the  comparisons  shown 
in  Table  4. 
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Table  4.  Miss  Granger  Causality  Analysis 

Path  Weights  for  Granger  Causality  Analysis.  The  path  weights  displayed  show 
significant  effective  connectivity  paths  that  are  stronger  in  the  human-agent  group 
compared  to  the  machine-agent  group  during  advice  utilization  (g(FDR)  <  .05).  The 
directionality  of  the  connectivity  is  shown  in  the  first  two  columns,  with  the  source 
column  showing  the  ROIs  that  predict  activation  in  the  target  column  ROIs.  The  strength 
of  connectivity  is  given  by  the  mean  path  weights  in  the  third  column.  LG,  lingual  gyrus; 
ACC,  anterior  cingulate  cortex;  aPreC,  anterior  precuneus;  CUN,  cuneus;  FG,  fusiform 
gyrus;  IPL,  inferior  parietal  lobule. 


Source 

Target 

Path  weight 

Human  Machine 

t  value 

p  value 

Decision  Phase 

LG 

ACC 

0.087 

-0.003 

3.23 

6.18  X  10'^ 

aPreC 

0.115 

0.009 

4.41 

5.23  X  10'^ 

CUN 

0.094 

-0.006 

3.49 

2.43  X  10'^ 

Feedback 

Phase 

FG 

IPL 

0.087 

-0.156 

3.03 

1.20  X  lO-'^ 

3.5  Discussion 

The  objective  of  this  research  was  to  expand  on  our  earlier  work  investigating  the 
behavioral  and  neural  signatures  of  advice  utilization  differences  between  expert  human 
and  machine  agents  during  good  and  bad  advice  (Goodyear  et  al.,  2015,  submitted).  We 
manipulated  agent  reliability  with  a  high  miss  rate  to  reveal  the  underlying  neural  basis 
(in  terms  of  both  activated  brain  regions  and  the  directional  interactions  between  them) 
involved  with  advice  utilization.  We  revealed  that  unreliable  advice  decreased 
performance  overall  as  shown  by  other  behavioral  studies  investigating  human-machine 
interactions  (Dzindolet  et  al.,  2002;  Goodyear  et  al.,  2015,  submitted),  and  advice 
utilization  decreased  more  for  the  machine-agent  group  compared  to  the  human-agent 
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group,  coinciding  with  another  study  investigating  the  effects  of  source  credibility  with 
varying  reliability  from  humans  and  machines  (Madhavan  &  Wiegmann,  2007). 

As  hypothesized,  our  results  demonstrated  that  advice  utilization  decreased  more 
for  the  machine-agent  group  compared  to  the  human-agent  group.  The  degradation  of 
advice  utilization  occurred  regardless  of  the  type  of  the  advice  (good,  bad)  given, 
showing  that  disconfirmation  experience  during  bad  advice  had  an  effect  on  all  decision¬ 
making  behaviors.  In  our  earlier  work,  we  showed  that  false  alarms  caused  a  degradation 
of  advice  utilization  during  bad  advice  (Goodyear  et  ah,  2015,  submitted),  but  for  our 
current  study,  we  expected  that  misses  would  cause  an  overall  adjustment  in  attention 
allocation  due  to  previous  evidence  showing  that  more  critical  types  of  events  (misses) 
lead  to  an  adaptation  in  monitoring  strategies  (Onnasch  et  ah,  2014).  Our  results 
indicated  that  advice  utilization  decreased  for  both  groups,  which  provides  evidence  that 
participants  made  changes  in  their  decision-making  behaviors  to  compensate  for  the 
unreliable  advice  that  they  received. 

In  addition,  we  compared  the  pre-reliability  ratings  with  the  actual  reliability  of 
each  agent  to  uncover  any  preconceived  notions  that  participants  had  about  the  human 
and  machine  agents.  We  demonstrated  that  for  both  groups  the  pre-reliability  ratings 
were  significantly  higher  than  the  actual  reliability,  which  could  indicate  that  participants 
had  high  initial  expectations  of  reliable  advice  since  the  agents  were  framed  as  experts. 

In  addition,  reliability  ratings  decreased  overall  from  pre-  to  post-experiment,  showing 
that  participants  were  able  to  decipher  the  performance  of  the  agents,  while  also 
recalibrating  their  expectations  due  to  bad  advice.  Furthermore,  we  revealed  that  trust 
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decreased  overall  from  pre-  to  post-experiment,  revealing  that  misses  degraded  trust, 
which  has  previously  been  reported  for  false  alarms  (Dixon  et  ah,  2007;  Goodyear  et  ah, 
2015,  submitted;  Rice  &  McCarley,  201 1).  Although  the  reliability  and  trust  ratings  did 
not  significantly  decrease  more  for  the  machine-agent  group,  the  ratings  were  still  lower 
compared  to  the  human-agent  group,  which  could  show  that  as  trust  and  reliability 
decreased,  advice  utilization  degraded  as  well.  Lastly,  since  we  showed  no  differences 
for  control  measures  or  confidence  ratings  between  the  agent  groups,  our  results  cannot 
be  explained  by  those  findings. 

We  next  identified  the  neural  mechanisms  and  the  underlying  directional  brain 
network  differentially  involved  with  advice  utilization  between  humans  and  machines. 
For  the  decision  phase,  our  effective  connectivity  network  revealed  the  LG  as  the  driver, 
or  source  ROI,  of  the  network,  sending  outputs  to  the  ACC,  aPreC  and  CUN. 
Furthermore,  the  strength  of  the  paths  emanating  from  LG  were  significantly  higher  for 
human  advice  compared  to  machine  advice.  The  results  indicate  that  the  LG  perceivably 
modulated  attention  during  advice  utilization  through  the  bottom-up  sensory  processing 
of  task-relevant  information.  It  has  been  postulated  that  sensory  processing  involves  a 
large-scale  integration  of  networks  with  attention  modulation  to  form  a  behavioral 
outcome,  or  a  cognition  (Mesulam,  1998).  For  example,  it  has  been  shown  that  detection 
of  stimulus  information  initially  starts  in  primary  sensory  areas,  and  is  then  conveyed  to 
regions  such  as  the  ACC,  showing  the  interaction  between  bottom-up  and  top-down 
processing  during  attentional  control  (Crottaz-Herbette  &  Menon,  2006).  Furthermore,  a 
study  investigating  advisor  competence  showed  increased  activity  in  the  visual  cortex 
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during  advice  integration  from  incompetent  advisors  (Sehilbaeh,  Eickhoff,  Sehultze, 
Mojzisch,  &  Vogeley,  2013).  The  authors  eonelude  that  the  aetivity  in  the  visual  eortiees 
may  relate  to  “perceptually  based  strategies”  during  reassessment  of  one’s  own 
judgments,  whieh  eould  support  our  findings  about  the  influence  of  visual  regions  on 
upstream  structures  sueh  as  PreC  and  ACC  during  advice  utilization  with  unreliable 
human  advisors.  Moreover,  the  involvement  of  the  visual  areas  during  the  decision  phase 
eould  be  attributed  to  the  faet  that  participants  had  to  revisualize  the  X-ray  images  in 
order  to  eompare  what  they  saw  to  the  advice  they  received. 

Furthermore,  our  neuroimaging  results  for  the  decision  phase  revealed  brain 
regions  associated  with  attentional  control  and  salience  deteetion  (ACC),  self-proeessing 
(aPreC)  and  sensory  information  proeessing  (LG  and  CUN).  LG  aetivation  has  been 
assoeiated  with  eomparing  advice  versus  no  advice  in  expert  and  peer  groups  (Suen, 
Brown,  Morek,  &  Silverstone,  2014)  and  activity  in  the  LG  and  CUN  has  been 
implicated  during  decisions  under  risk  when  comparing  a  message  to  aecept  or  rejeet 
adviee  with  no  message  (Engelmann  et  ah,  2009)  and  during  decisions  correlated  with 
value  or  saliency  (Litt,  Plassmann,  Shiv,  &  Rangel,  2011).  ACC  aetivation  has  been 
shown  to  be  involved  with  eonflict  monitoring  during  deeision-making  (Botvinick,  2007) 
and  error  detection  and  prediction  error  (Beckmann,  Johansen-Berg,  &  Rushworth, 

2009),  while  the  PreC  has  been  identified  to  play  a  role  in  integrations  of  one’s  mental 
state  (Terasawa,  Fukushima,  &  Umeda,  2013).  Our  neuroimaging  results  demonstrated 
that  all  areas  exeept  for  the  ACC  had  higher  aetivations  for  the  human-agent  group 
eompared  to  the  machine-agent  group,  indicating  that  participants  in  the  human-agent 
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group  may  have  had  a  greater  inerease  in  perceptual  processing  and  perceivably  less 
monitoring  of  errors.  Conversely,  participants  in  the  machine-agent  group  were  more 
attuned  to  the  advice  errors,  which  was  also  indicated  behaviorally,  which  could  explain 
the  ACC  activation  differences. 

In  addition  to  the  decision  phase,  we  expected  a  behavioral  adjustment  in  advice 
utilization  due  to  feedback.  For  the  feedback  phase,  our  effective  connectivity  network 
showed  that  the  FG  was  the  driver  of  the  network  that  sent  an  output  to  the  IPL.  The  FG 
has  been  associated  with  receipt  of  monetary  rewards  and  penalties  during  an  outcome 
phase  (Dillon  et  al.,  2008),  while  the  IPL  has  been  identified  to  play  a  role  during  advice 
evaluation  when  interacting  with  competent  and  incompetent  advisors  (Schilbach  et  al., 
2013)  and  during  decision  uncertainty  when  given  trial-by-trial  feedback  (Vickery  & 
Jiang,  2009).  Furthermore,  the  neuroimaging  results  for  the  feedback  phase  revealed 
activity  in  the  PrG,  CUN  and  Pu.  Activity  in  the  PrG  has  been  implicated  during 
comparisons  of  humans  and  computers  during  rock-paper-scissors  games  (Chaminade  et 
al.,  2012)  and  CUN  activity  has  been  shown  to  be  related  to  inferential  errors  during  a 
feedback  phase  (Cooper,  Kreps,  Wiebe,  Pirkl,  &  Knutson,  2010).  Lastly,  we  revealed 
activity  in  the  dorsal  striatum  (Pu),  which  has  been  implicated  in  stimulus-response 
learning  (Packard  &  Knowlton,  2002)  and  during  responses  to  affective  feedback  in 
regards  to  valence  and  magnitude  (Delgado,  Locke,  Stenger,  &  Fiez,  2003).  Our  results 
for  the  feedback  phase  illustrate  that,  for  all  regions  except  for  CUN  and  FG,  activations 
were  higher  for  the  machine-agent  group  compared  to  the  human-agent  group.  This 
pattern  of  activation  indicates  that  as  participants  in  the  machine-agent  group  became 
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more  aware  of  the  errors  in  advice,  they  may  have  placed  more  value  on  the  outcome  of 
their  decisions  as  opposed  to  just  processing  of  sensory  information. 

There  are  a  couple  limitations  that  need  to  be  considered  with  the  interpretation  of 
our  results.  First,  we  looked  at  differences  between  good  and  bad  advice  with  only 
misses  as  the  type  of  error.  However,  our  previous  research  on  false  alarms  (Goodyear  et 
ah,  2015,  submitted)  provided  substantiation  for  expanding  on  the  effects  of  advice 
utilization  with  different  error  types  and  future  studies  could  include  both  types  of  errors 
to  compare  the  two  directly.  In  addition,  participants  received  advice  before  they  made 
their  decisions  in  order  to  prevent  cognitive  anchoring,  or  the  tendency  to  rely  on  the  first 
piece  of  information  acquired.  Future  studies  could  investigate  the  effects  of  cognitive 
anchoring  by  implementing  a  task  where  participants  receive  advice  after  they  make  their 
decisions. 

In  conclusion,  our  results  have  shown  that  advice  utilization  differs  between 
humans  and  machines  and  those  distinctions  are  contingent  on  miss  errors.  Our  findings 
expand  on  the  existing  literature  by  showing  that  misses  degrade  advice  utilization,  which 
is  represented  in  a  neural  network  involving  salience  detection  and  self-processing  with 
perceptual  integration.  As  our  society  progresses  in  technological  terms,  having  a  greater 
conceptualization  of  how  decision-making  processes  differ  during  interactions  with 
humans  and  machines  can  provide  pertinent  information.  A  better  understanding  of  those 
interactions  can  ultimately  allow  for  safety  measures  to  prevent  any  mishaps  that  can 
occur  during  advice  taking. 
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CHAPTER  FOUR:  GENERAL  DISCUSSION 


This  thesis  has  examined  the  impaet  of  misses  and  false  alarms  during  advice 
utilization  from  human  and  machine  agents  in  a  series  of  two  studies.  The  goal  of  this 
thesis  was  to  provide  a  basis  for  understanding  the  complex  neural  and  behavioral 
mechanisms  involved  during  advice  utilization,  which  can  ultimately  serve  to  develop  a 
framework  underlining  the  constituents  of  human  and  machine  interactions.  In  each 
study  we  demonstrated  that  there  were  unique  behavioral  responses  and  brain  activation 
patterns  associated  with  each  error  type.  The  rest  of  Chapter  Four  will  generally  discuss 
the  behavioral  and  brain  activation  differences  across  the  studies  along  with  future 
directions. 

4.1  Behavioral  Results 

In  Chapter  Two  and  Chapter  Three  we  revealed  that  the  no  agent  groups 
performed  significantly  better  than  the  human  and  machine  agent  groups.  The  results 
indicate  that  regardless  of  error,  individuals  who  performed  the  task  unassisted,  and  did 
not  receive  unreliable  advice,  performed  better  overall.  It  has  been  postulated  that  false 
alarms  cause  individuals  to  ignore  true  alerts  leading  to  a  decline  in  performance,  while 
misses  create  higher  workloads  from  increased  monitoring,  which  also  affects 
performance  (Sanchez,  Rogers,  Fisk,  &  Rovira,  2014).  We  therefore  expected  in  both 
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studies,  that  unreliable  advice  would  decrease  performance  due  to  high  error  rates  and 
that  participants  in  the  no  agent  groups  would  perform  significantly  better.  Despite  the 
fact  that  participant  performance  in  the  no  agent  groups  was  higher  than  performance  in 
both  studies,  the  accuracy  rate  was  still  not  ideal  for  any  real  world  applications.  These 
results  align  with  the  findings  by  Wickens  and  Dixon  (2007)  that  showed  that  automation 
reliability  below  70%  significantly  decreased  performance  compared  to  performing  the 
task  unassisted.  Although  our  study  included  both  humans  and  machines  with  low 
reliability,  it  is  possible  that  the  optimal  reliability  set  point  is  not  necessarily  dependent 
on  the  source  of  advice.  Moreover,  our  results  provide  evidence  that  misses  could  have 
created  higher  vigilance  in  performance  compared  to  false  alarms,  which  may  have  fewer 
repercussions  if  ignored. 

In  Chapter  Two  we  showed  that  advice  utilization  degraded  more  for  the  human- 
agent  group,  while  in  Chapter  Three  advice  utilization  degraded  more  for  the  machine- 
agent  group.  We  hypothesized  that  advice  utilization  would  decrease  more  for  the 
machine-agent  group  in  both  studies  due  to  previous  findings  that  showed  that  when 
advice  is  70%  reliable,  participants  agree  more  with  expert  humans  and  depend  less  on 
expert  machines  (Madhavan  &  Wiegmann,  2007).  However,  the  study  by  Madhavan  and 
Wiegmann  (2007)  focused  on  the  combination  of  false  alarms  and  misses  without 
separating  the  two  error  types  and  that  might  explain  why  we  found  differences  in 
Chapter  Two  compared  to  Chapter  Three.  Our  results  further  indicate  that  accountability 
may  be  higher  during  interactions  with  a  human  when  an  error  is  a  false  alarm  and  when 
an  error  is  a  miss,  accountability  may  be  higher  during  interactions  with  a  machine. 
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Previous  work  has  revealed  that  70%  reliable  automation  may  disrupt  preconceived 
notions  associated  with  the  perfect  automation  schema  due  to  the  effects  of  dispositional 
factors  associated  with  advisors  (Madhavan  &  Wiegmann,  2007)  and  participant’s 
perceived  accountability  for  their  performance  may  be  due  to  automation  bias,  or  the 
tendency  toward  usage  of,  or  reliance  on,  automation  without  actively  seeking  or 
processing  information  (Mosier  et  ak,  1998).  Our  results  reflect  a  disruption  in  the 
perfect  automation  schema,  or  biases  associated  with  automation  when  the  error  was  a 
miss,  which  could  be  due  to  the  costly  consequences  of  a  miss  error. 

For  reliability,  we  revealed  in  Chapter  Two  that  the  human  agent’s  pre-reliability 
was  significantly  higher  than  the  machine  agent’s  pre-reliability  and  the  reliability  ratings 
significantly  decreased  pre-  to  post-experiment  for  the  human-agent  group.  Furthermore, 
the  human  agent’s  perceived  reliability  was  significantly  higher  than  the  actual  reliability 
of  the  agent.  These  results  suggest  that  expectations  of  reliable  advice  were  higher  for 
the  human-agent  group  compared  to  the  machine-agent  group,  which  ultimately  led  to  a 
behavioral  adjustment  in  advice  utilization  over  time.  In  comparison,  for  Chapter  Three, 
the  reliability  ratings  did  not  differ  between  the  agent  groups,  but  the  perceived  reliability 
ratings  for  both  the  human  agent  and  machine  agent  were  significantly  higher  than  the 
actual  reliability,  showing  that  initial  expectations  of  reliable  advice  were  high  for  both 
groups.  Initial  expectations  of  reliable  advice,  as  seen  during  the  comparison  of  the 
perceived  reliability  to  the  actual  reliability  of  each  agent,  can  lead  to  a  decline  in 
dependence  on  an  agent  and  miscalibration  of  an  agent’s  reliability  (Madhavan  & 
Wiegmann,  2007).  The  reliability  ratings  were  initially  higher  than  the  actual  reliability 
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for  the  human  agent  in  Chapter  Two  and  for  both  groups  in  Chapter  Three,  indicating 
high  expectations  of  reliable  advice.  However,  upon  observation  of  the  errors  (40%) 
generated  by  the  agents,  the  participant’s  advice  utilization  degraded  rapidly.  Moreover, 
in  Chapter  Two,  the  machine  agent’s  perceived  pre-reliability  ratings  were  not 
significantly  different  from  the  actual  reliability  of  the  agent,  showing  that  initial 
expectations  of  reliability  were  not  high  and  thus  participants  may  not  have  needed  to 
recalibrate  their  expectations  as  indicated  by  less  degradation  of  advice  utilization. 

In  Chapter  Two,  we  demonstrated  that  trust  significantly  decreased  for  the  human 
agent,  however  in  Chapter  Three,  trust  decreased  for  both  groups.  In  Chapter  Two, 
advice  utilization  decreased  more  for  the  human-agent  group  compared  to  the  machine- 
agent  group,  which  was  also  reflected  by  the  change  in  trust  ratings  only  for  the  human- 
agent  group.  Similarly,  in  Chapter  Three,  advice  utilization  decreased  for  both  groups, 
which  was  also  reflected  in  the  change  in  trust  ratings  for  both  groups.  It  has  been 
suggested  that  user  attitudes  such  as  trust  may  affect  how  individuals  decide  to  use 
automation  (Lee  &  See,  2004).  For  example,  a  study  showed  that  human  experts  were 
trusted  more  than  machine  experts  (Madhavan  &  Wiegmann,  2007)  which  indicates  that 
trust  may  be  one  of  the  components  involved  during  advice  utilization  interactions  for 
both  humans  and  machines. 

Lastly,  we  looked  at  confidence  ratings  and  for  Chapter  Two  and  Chapter  Three 
we  showed  that  confidence  was  rated  higher  on  target  bags  compared  to  non-target  bags. 
However,  we  showed  no  difference  between  the  agent  groups  for  confidence  ratings. 
Previous  research  has  indicated  that  self-confidence  may  affect  decision  biases,  which 
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may  change  performance  aeeuraey  (Madhavan  &  Gonzalez,  2006)  and  when  trust 
exeeeds  self-confidence,  individuals  tend  towards  automation  use  (Lee  &  Moray,  1992). 
Since  our  findings  did  not  show  differenees  between  the  agent  groups,  the  differenees  in 
adviee  utilization  between  the  human  and  maehine  agents  eannot  be  explained  by  self- 
eonfidenee. 

For  response  times,  we  found  that  for  both  Chapter  Two  and  Chapter  Three 
responses  were  faster  during  run  2  compared  to  run  1  and  for  Chapter  Two,  responses 
were  faster  during  good  advice  compared  to  bad  advice.  These  results  indieate  that  as 
participants  became  more  familiar  with  the  task  they  were  able  to  respond  faster  to  the 
adviee  given.  Furthermore,  partieipants  in  Chapter  Two  may  have  had  more  eonflicting 
perceptual  proeesses  involved  during  false  alarm  trials  as  reflected  by  slower  responses 
during  bad  advice.  Research  on  response  times  have  demonstrated  that  false  alarms  may 
result  in  a  delayed  or  no  response  to  alerts  (Breznitz,  2013)  and  our  results  are  in 
aeeordanee  with  those  findings. 

Monetary  deduetions  were  used  as  incentives  and  as  a  way  to  create  a  risky 
environment  for  participants  in  order  to  help  evaluate  variables  such  as  trust  towards  the 
human  and  maehine  agents.  In  Chapter  Two,  we  found  that  deductions  were  higher 
during  bad  advice  compared  to  good  adviee;  in  Chapter  Three  we  found  that  deduetions 
were  higher  during  run  1  compared  to  run  2.  The  impaet  of  errors  on  monetary 
deduetions  was  revealed  as  partieipants  made  more  eostly  errors  in  Chapter  Two,  while  in 
Chapter  Three,  partieipants  made  less  eostly  errors  over  time. 


84 


4.2  FMRI  Results 


In  Chapter  Two,  we  revealed  a  network  that  involved  brain  regions  associated 
with  social  evaluations  (aPreC,  PCC),  while  in  Chapter  Three  there  was  a  network 
engaged  with  visual  processing  of  sensory  information  (LG).  As  expected,  the 
comparison  of  the  studies  show  that  there  are  distinct  neural  networks  involved  with  false 
alarms  compared  to  misses  during  advice  utilization  from  human  and  machine  agents. 

Our  results  are  in  line  with  the  findings  of  Onnasch  et  al.  (2014)  and  Breznitz  (2013),  that 
false  alarms  may  cause  operators  to  have  delayed  responses,  or  no  response  at  all,  while 
misses  may  change  operator’s  strategies  during  non-alarm  periods  causing  a  reallocation 
of  attention.  In  Chapter  Two  we  revealed  a  brain  network  involved  with  social 
evaluations  of  the  dispositional  characteristics  of  the  agents,  while  in  Chapter  Three  there 
was  a  network  involved  with  visual  processing  and  error  monitoring,  as  participants 
shifted  their  attention  towards  the  task  at  hand.  Since  false  alarms  are  not  necessarily 
detrimental,  but  more  of  a  nuisance,  participants  may  have  had  more  time  to  evaluate 
human  traits  such  as  trust  or  agent  effort  leading  to  involvement  of  regions  associated 
with  social  evaluations.  On  the  other  hand,  due  to  the  catastrophic  nature  of  misses, 
participants  may  have  concentrated  more  on  situational  factors,  such  as  task  difficulty, 
which  was  reflected  by  recruitment  of  visual  processing  regions.  The  comparisons 
between  Chapter  Two  and  Chapter  Three  during  the  decision  phase  provides  evidence 
that  there  are  separate  perceptual  processes  involved  with  each  error  type,  which  has  also 
been  demonstrated  with  changes  in  cortical  activity  during  a  contrast-detection  task 
comparing  misses  to  false  alarms  (Ress  &  Heeger,  2003). 
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Interestingly,  the  feedback  phase  results  demonstrated  a  similar  pattern  to  that  of 
the  decision  phase  results  for  both  studies,  with  areas  involved  with  social  evaluations 
(dmPFC)  and  processing  of  sensory  information  (FG,  IPL).  The  results  indicate  that 
there  was  a  unique  pattern  of  activity  for  brain  regions  involved  during  the  feedback 
phase  as  participants  were  able  to  evaluate  their  own  performance  based  on  the  advice 
given  to  them.  These  findings  are  of  particular  importance  because  it  provides  a  greater 
discernment  of  the  underlying  mechanisms  involved  during  learning  and  behavioral 
adaptations  to  unreliable  advice.  As  with  the  decision  phase,  the  feedback  phase  results 
for  Chapter  Two  and  Chapter  Three  provides  evidence  that  there  may  be  distinct 
processes  involved  with  perceptions  of  different  error  types. 

4.3  Future  Directions  and  Conclusions 

The  findings  of  Chapter  Two  and  Chapter  Three  provide  insight  into  the 
differences  between  error  types  during  decision-making,  which  ultimately  serves  to 
optimize  our  understanding  of  how  individuals  choose  to  utilize  or  discount  advice  from 
different  agents.  Future  studies  could  elaborate  on  our  findings  by  implementing  a 
paradigm  with  agent  reliability  above  the  70%  threshold  to  investigate  the  behavioral 
responses  and  the  underlying  brain  network  involved  with  reliable  advice.  Furthermore, 
future  studies  could  expand  on  our  results  by  implementing  a  paradigm  with  no  feedback, 
or  positive  and  negative  feedback,  mirroring  human  etiquette.  Additionally,  we  aimed  to 
discern  the  effective  connectivity  network  associated  with  advice  utilization  with  Granger 
Causality  Analysis.  Granger  Causality  Analysis  was  used  for  our  studies  since  it  is 
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particularly  advantageous  for  exploratory  analysis  and  for  assessing  directional 
influences  of  selected  ROIs  without  an  a  priori  hypothesis.  To  further  validate  our 
findings,  future  studies  eould  implement  methods  such  as  dynamic  causal  modeling 
(DCM)  with  a  hypothesized  network  that  is  predefined  to  model  the  effeetive 
eonneetivity  results  that  we  diseovered. 

In  eonelusion,  this  thesis  has  aimed  to  uneover  the  faetors  that  influence  advice 
utilization  from  humans  and  machines  by  assessing  the  behavioral  responses  and  neural 
meehanisms  associated  with  those  interactions.  The  overall  objective  of  this  researeh 
was  to  provide  a  foundation  that  will  faeilitate  the  development  of  a  eohesive  model 
explaining  the  behavioral,  eognitive,  and  neural  basis  of  adviee  utilization  during  human- 
automation  interaetions  by  bridging  the  gap  between  human  factors  and  cognitive 
neuroseience  research.  The  findings  of  this  thesis  are  especially  salient  for  the  future  as 
teehnologieal  progressions  eontinue  to  inerease  exponentially  and  the  shift  to  automation 
use  beeomes  inevitable. 
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APPENDIX  A:  FALSE  ALARMS 


A.l  Experimental  Setup 

a) 


X-ray  Bag 


Target 
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c) 


Fixation 


Advice 
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Decision 


Advice: 

Search  Clear 
Choice: 
Search  Clear 
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Knife 

Present 


]  Jitter 

+  + 
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A.l.  a)  Example  Stimuli  Used  for  the  X-ray  Luggage-screening  Task.  During  the 
normative  rating  task,  participants  rated  320  X-ray  luggage  images  (120  target:  60  high 
clutter,  60  low  clutter;  200  non- target:  100  high  clutter,  100  low  clutter)  that  contained 
everyday  objects  (hair-dryers,  clothes,  etc.)  and  a  possible  target  present  (5  different 
knives,  with  one  possible  per  image)  based  on  clutter,  difficulty  and  confidence  in  finding 
the  knife,  b)  Decision  Matrix.  Breakdown  for  each  advice  type  given  during  the 
experiment,  c)  X-ray  Luggage-Screening  Task.  During  each  trial,  participants  would 
first  see  a  fixation  cross,  advice  from  one  of  the  agents  to  “search”  or  “clear”  the  bag,  an 
image  of  the  X-ray  luggage  bag,  a  decision  to  accept  or  reject  the  advice  of  the  agent  to 
“search”  or  “clear”  the  bag,  fixation  crosses,  feedback  indicating  if  their  decision  was 
correct  or  incorrect  and  lastly,  fixation  crosses. 
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A.2  Human  and  Machine  Agent  Descriptions 


Human;  Mr.  Steve  Williams 

Mr.  Steve  Williams  (Human)  is  a  trained  luggage  screener,  with  extensive  knowledge  in 
identifying  illegal  imports  inside  airline  luggage.  He  has  served  the  past  5  years  in  some 
of  the  busiest  airports  in  the  United  States  working  at  security  checkpoints.  He  also 
specializes  in  antiterrorism  and  airport  security  and  possesses  extensive  knowledge  about 
the  types  of  modern  weapons  and  explosives  commonly  smuggled  aboard  aircraft.  Mr. 
Williams  has  recently  been  appointed  by  the  Transportation  Security  Administration 
(TSA)  to  oversee  security  operations  at  Dulles  International  Airport,  which  is  one  of  the 
largest  airports  in  the  world. 

Machine:  Automated  Luggage  Inspector 

The  automated  luggage  inspector  (Machine)  is  a  diagnostic  aid  that  has  been  programmed 
to  identify  hidden  contraband  in  airline  luggage.  This  Machine  is  based  upon  the 
technology  traditionally  used  at  major  airport  security  checkpoints  over  the  past  5  years. 
Its  algorithms  are  sophisticated  and  are  based  on  judgments  using  sensors  different  from 
those  of  the  human  visual  system  and  can  detect  modem  weapons  and  explosives 
smuggled  aboard  aircrafts.  The  automated  luggage  detector  has  recently  been  employed 
by  the  Transportation  Security  Administration  (TSA)  to  enhance  security  operations  at 
Dulles  International  Airport,  which  is  one  of  the  largest  airports  in  the  world. 
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A.3  Brain  Regions  Associated  with  the  Main  Effect  of  Advice 

Brain  regions  showing  significant  activation  clusters  associated  during  the  decision 

(minimum  cluster  of  21)  and  feedback  (minimum  cluster  of  36)  phases  (a  <  .05,  cluster- 
level  threshold  corrected).  For  the  decision  phase,  a  significant  activation  cluster  was 
found  in  the  right  orbitofrontal  cortex  (superior  frontal  gyrus,  BA  11).  For  the  feedback 
phase,  significant  activation  clusters  were  found  in  right  middle  frontal  gyrus  (BA  6/8), 
right  superior  parietal  lobule  (BA  7),  right  putamen,  right  posterior  cingulate  cortex  (BA 
30),  right  head  of  the  caudate,  left  orbitofrontal  cortex  (medial  frontal  gyrus,  BA  11),  left 
precentral  gyrus  (BA  4),  left  subcallosal  gyrus  (BA  34),  left  middle  frontal  gyrus  (BA  6), 
left  dorsolateral  prefrontal  cortex  (middle  frontal  gyrus,  BA  46)  and  left  inferior  frontal 
gyrus  (BA  47). 


F  (1,22)  value  Cluster  Size  (mm^ ) 

X 

y 

z 

Decision  Phase 

Advice 

Right  orbitofrontal  cortex 

Feedback  Phase 

13.14 

673 

18 

45 

-18 

Advice 

Right  middle  frontal  gyrus 

16.47 

4848 

36 

18 

57 

Right  superior  parietal  lobule 

13.05 

2010 

21 

-45 

57 

Right  putamen 

12.18 

1867 

33 

-3 

3 

Right  posterior  cingulate  cortex 

12.47 

4937 

6 

-51 

15 

Right  head  of  the  caudate 

14.27 

1968 

9 

12 

-9 

Left  orbitofrontal  cortex 

12.30 

3348 

-9 

48 

-15 

Left  precentral  gyrus 

15.29 

4486 

-24 

-24 

63 

Left  subcallosal  gyrus 

13.08 

2204 

-12 

3 

-12 

Left  middle  frontal  gyrus 

12.05 

2553 

-33 

25 

60 

Left  dorsolateral  prefrontal  cortex 

15.05 

2228 

-42 

36 

12 

Left  inferior  frontal  gyrus 

15.75 

1778 

-36 

27 

-6 
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A.4  Schematic  Illustrating  the  Effective  Connectivity  Analysis  Pipeline 


FMRI  ROI  time  series 
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A.5  Behavioral  Results  for  Decision  Phase 
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Confidence 


A.5.  {M  ±  SEM).  a)  Response  Times.  Response  times  were  faster  overall  from  run  1  to 
run  2  and  during  good  advice  compared  to  bad  advice,  b)  Monetary  Deductions. 
Monetary  deductions  were  higher  overall  for  bad  advice  compared  to  good  advice. 


A.6  Results  for  the  Confidence  Ratings 


*/?<  0.001 


■  Human 
■Machine 


A.6.  (M  ±  S£M).  Confidence  ratings  were  significantly  lower  during  non-target  bags 
compared  to  target  bags. 
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A.7  Descriptive  Statistics  for  Psychological  Control  Measures 

No  significant  differences  were  found  between  the  human-  and  machine-agent 

groups  {M±  SD). 


Category 

Human 

Machine 

Statistics 

Demographics 

df=22 

Age 

20.33  ±2.55 

20.42  ±  2.75 

t  =  -0M,p  =  .939 

Education 

14.08  ±2.35 

14.13  ±  1.65 

t  =  -0.50,  =  .960 

Handedness 

96.53  ±8.31 

92.49  ±  6.77 

t=  1.31,;?  =  . 205 

Gender  (male/female) 

7/5 

6/6 

/=  0.17,/?  =  .683 

Complacency-Potential  Rating  Scale  (CPS) 


Confidence 

15.17±2.13 

14.50  ±  1.78 

II 

oo 

o 

II 

Reliance 

9.50  ±  1.68 

10.33  ±  1.78 

/  =  -1.18,;9  =  .250 

Trust 

8.58  ±2.28 

8.92  ±  1.44 

/  = -0.43,;;  =  .672 

Safety 

6.25  ±  1.71 

6.75  ±2.09 

/  = -0.64,;?  =  .529 

Interpersonal  Reactivity  Index  (IRI) 

Perspective  Taking 

28.25  ±2.30 

28.33  ±3.37 

/  = -0.71,;;  =  .944 

Fantasy  Scale 

19.33  ±2.84 

20.25  ±2.80 

/  = -0.80,;;  =  .434 

Empathic  Concern 

21.67  ±5.07 

22.33  ±2.39 

oo 

II 

o 

1 

II 

Personal  Distress 

20.75  ±2.80 

20.67  ±  2.96 

/  =  0.71,;;  =  .944 

NEO  Five-Factor  Inventory  (NEO-FFI) 

Neuroticism 

31.33  ±4.89 

32.67  ±3.94 

o 

II 

o 

1 

II 

Extraversion 

41.92  ±3.37 

40.42  ±3.26 

t=  1.11,;9  =  .280 

Openness 

37.75  ±3.60 

36.92  ±4.72 

II 

o 

II 

b^ 

Agreeableness 

38.67  ±4.05 

41.00  ±4.51 

/  =  -1.33,;9  =  .196 

Conscientiousness 

41.50  ±3.56 

42.17  ±3.49 

/  = -0.46,  ;;  =  . 647 

National  Technology  Readiness  Survey  (NTRS) 

Optimism 

37.58  ±4.87 

39.08  ±4.54 

/  = -0.78,;;  =  .444 

Innovativeness 

21.75  ±4.20 

24.83  ±4.24 

/  = -1.79,;;  =  .087 

Discomfort 

31.00±5.21 

31.50  ±5.02 

/  = -0.24,;;  =  .813 

Insecurity 

30.33  ±5.68 

29.08  ±3.85 

in 

II 

O 

II 

Propensity  to  Trust  (PTT) 

Trust  towards  Automation 

19.83  ±2.21 

20.42  ±  2.07 

/  = -0.67,;;  =  .511 
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APPENDIX  B:  MISSES 


B.l  Experimental  Setup 
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B.l.  a)  X-ray  Luggage-Screening  Task.  During  each  trial,  participants  would  first  see  a 
fixation  cross,  advice  from  one  of  the  agents  to  “search”  or  “clear”  the  bag,  an  image  of 
the  X-ray  luggage  bag,  a  decision  to  accept  or  reject  the  advice  of  the  agent  to  “search”  or 
“clear”  the  bag,  fixation  crosses,  feedback  indicating  if  their  decision  was  correct  or 
incorrect  and  lastly,  fixation  crosses,  b)  Decision  Matrix.  Breakdown  for  each  advice 
type  (good,  bad)  given  during  the  experiment. 

B.2  Effective  Connectivity  Analysis 

Granger  causality  is  based  on  a  concept  of  causality  that  can  be  used  to  predict  directional 
influences  among  chosen  brain  regions  through  mulitvariate  effective  connectivity 
modeling  of  ROl  (region  of  interest)  time  courses  (Deshpande,  LaConte,  James,  Peltier, 

&  Hu,  2009;  Friston,  Harrison,  &  Penny,  2003;  Granger,  1969;  Preusse,  van  der  Meer, 
Deshpande,  Krueger,  &  Wartenburger,  201 1).  The  model  examines  the  relationship  of 
variables  in  time,  such  that  given  two  variables,  a  and  b,  if  past  values  of  a  better  predict 
the  present  value  of  b,  then  as  a  function  of  earlier  time  points,  causality  between  the 
variables  can  be  inferred  (Goodyear  et  al.,  2015,  submitted;  Hampstead  et  al.,  2011; 
Krueger,  Landgraf,  van  der  Meer,  Deshpande,  &  Hu,  2011;  Roebroeck,  Formisano,  & 
Goebel,  2005).  Granger  causality  analysis  is  a  data-driven  approach  and  thus  is 
advantageous  for  application  of  effective  connectivity  since  there  is  no  requirement  for 
pre-specified  connectivity  models  like  dynamic  causal  modeling  (Deshpande  &  Hu, 

2012;  Deshpande  et  al.,  2009;  Deshpande,  Sathian,  &  Hu,  2010). 
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B.3  Schematic  Illustrating  the  Effective  Connectivity  Analysis  Pipeline, 


Mean  time  series 
from  aetivated  regions 


B.3.  The  mean  time  series  from  the  ROIs  from  the  deeision  and  feedbaek  phases  were 
extraeted,  then  blind  hemodynamie  deeonvolution  was  performed  using  a  Cubature 
Kalman  Filter  to  reveal  the  underlying  latent  neural  time  series.  Next,  these  time  series 
were  applied  to  a  dynamie  Multivariate  Autoregressive  Model  based  on  a  Granger 
eausality  framework.  Granger  eonnectivity  path  weights  were  populated  into  two 
samples  and  t-tests  were  performed  for  each  effective  connectivity  path  to  reveal  those 
that  were  significantly  different  between  the  agent  groups. 
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B.4  Descriptive  Statistics  for  Psychological  Control  Measures 

No  significant  differences  were  found  between  the  human-  and  machine-agent 
groups  {M±  SD). 


Category 

Human 

Machine 

Statistics 

Demographics 

df=22 

Age 

22.58  ±2.39 

21.92  ±2.43 

t  =  0.68,  =  .505 

Education 

16.25  ±  1.71 

16.08  ±2.68 

t  =  0.18,  =  .858 

Gender  (male/female) 

7/5 

7/5 

x'(l)  =  0.67,  p  =  . 414 

Complacency-Potential  Rating  Scale  (CPS;  feelings 
toward  automation) 


Confidence 

Reliance 

Trust 

Safety 

16.17  ±2.41 
10.50±  1.51 

10.17  ±  1.95 
6.50  ±  1.68 

15.42  ±  1.78 
10.08  ±  1.44 
8.67  ±  1.92 
6.00  ±  1.35 

/  =  0.89,;?  =  .395 
t  =  0.69,  =  .496 
t=  1.90,;?  =  .071 
t  =  0.80,  =  .430 

Interpersonal  Reactivity  Index  (IRI;  separate  facet  of 
empathy) 

Perspective  Taking 

Fantasy  Scale 

Empathic  Concern 

Personal  Distress 

27.83  ±2.13 
19.00  ±3.72 

22.83  ±2.25 
19.92  ±3.06 

26.58  ±3.29 

20.58  ±  1.51 
22.92  ±2.19 
19.33  ±2.27 

t=  1.11,;?  =  .281 
^  =  -1.37,;?  =  . 185 
/  = -0.09,;?  =  .928 
t  =  0.53,  =  .601 

NEO  Five-Factor  Inventory  (NEO-FFI;  personality 
styles) 

Neuroticism 

Extraversion 

Openness 

Agreeableness 

Conscientiousness 

31.83  ±3.56 

41.25  ±4.69 
36.50  ±4.44 
38.17  ±4.45 

43.25  ±3.08 

33.50  ±3.83 

40.83  ±3.71 

35.83  ±2.86 

37.50  ±4.98 
42.17  ±4.32 

/  =  -1.10,;?  =  .281 
/  =  0.24,;?  =  .812 
t  =  0.43,  =  .666 
/  =  0.35,;?  =  .733 
/  =  0.71,;?  =  .487 

National  Technology  Readiness  Survey  (NTRS; 
embracing  new  technologies) 

Optimism 

Innovativeness 

Discomfort 

Insecurity 

38.50  ±4.72 
20.92  ±5.62 
28.25  ±  5.64 
28.67  ±5.09 

37.75  ±5.97 
22.58  ±4.10 

30.50  ±4.85 

28.50  ±3.85 

/  =  0.34,;?  =  .736 
/  = -0.83,;?  =  .415 
t  =  -1.05,  =  .306 
/  =  0.09,  y^  =  . 929 

Propensity  to  Trust  (PTT;  trust  towards  automation) 

Trust  towards  Automation 

21.17  ±2.04  21.33  ±2.10  t  =  -0.20,p  =  .846 
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B.5  Results  for  the  Decision  Phase 
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B.5.  (M±SEM).  a)  Response  Times.  Response  times  were  faster  overall  during  run  2 
eompared  to  run  1 .  b)  Monetary  Deductions.  Monetary  deductions  were  higher  during 
run  1  compared  to  run  2.  GA  =  good  advice;  BA  =  bad  advice. 
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Confidence 


B.6  Confidence  Ratings  Results 
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B.6.  (M±  SEM).  Confidence  ratings  were  significantly  lower  during  non-target  bags 
compared  to  target  bags. 
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Abstract 

Well-calibrated  human-automation  trust  (HAT)  is  an  essential  ingredient  for  efficiency,  communication,  and 
safety  in  complex  human-automation  interactions.  A  dichotomy  between  HAT  and  human-human  trust 
(HHT)  has  been  proposed:  some  scholars  argue  that  HAT  and  HHT  are  fundamentally  different  due  to 
initial  perception  and  lack  of  intention  on  the  part  of  automation,  while  others  claim  that  HAT  and  HHT  are 
equal,  since  similar  social  interactions  as  between  humans  can  be  elicited  when  automation  is  designed  to 
be  human-like.  Although,  recent  behavioral  research  has  provided  evidence  for  both  accounts  and  a 
plethora  of  neural  evidence  for  HHT  already  exists;  however,  the  underlying  neural  signatures  for  HAT  and 
its  relationship  to  HHT  are  still  unexplored.  Behavioral  measures  alone  are  unlikely  to  allow  one  to 
distinguish  between  HHT  and  HAT,  because  the  same  behavioral  outcome  can  be  associated  with  very 
different  underlying  neural  mechanisms.  Assessing  both  performance  and  brain  function  can  provide  more 
information  than  either  alone.  The  objective  of  this  proposal  was  to  investigate  the  similarities  and 
differences  of  the  neural  systems  of  HAT  and  HHT  in  a  series  of  three  studies  that  combined  a  behavioral 
X-ray  luggage-screening  task  with  functional  magnetic  resonance  imaging  (fMRI)  and  manipulated 
reliabilities  of  advice  (unknown  to  the  participants)  as  the  key  feature  for  HAT  and  HHT  interactions. 

Healthy  participants  were  asked  to  search  for  knives  hidden  in  densely  cluttered  X-ray  images  of  luggage 
after  receiving  advice  (presence  or  absence  of  a  knife)  from  a  human  or  automated  luggage  inspector 


(framed  as  experts).  HAT  and  HHT  were  measured  as  the  acceptance  rates  of  advice  either  giving  by  the 
machine  or  human  agent.  By  adopting  a  comprehensive,  interdisciplinary  research  program  including 
scientists  from  social  cognitive  neuroscience,  psychology,  and  human  factors,  we  accomplished  the  overall 
objective  of  this  proposal  by  pursuing  the  following  three  specific  aims: 

Aim  #1 :  Neural  signatures  of  HAT  based  on  reliable  human-automation  interactions.  In  study  1 ,  participants 
performed  the  security  screening  task  and  decided  whether  to  search  or  clear  the  luggage  after  receiving 
advice  from  a  human  or  automated  luggage  inspector  with  a  manipulated  reliability  of  90%.  HHT  was 
initially  lower  than  HAT,  probably  due  to  the  preconceived  notions  of  automation  being  perfect.  However, 
overtime  differences  between  HHT  and  HAT  disappeared  based  on  a  higher  degree  of  confidence  toward 
the  human  adviser  to  perform  the  task  based  on  the  received  feedback.  This  reinforcement  learning 
process  was  mirrored  by  activations  in  reward-sensitive  brain  regions,  including  the  dorsal  striatum  and 
ventromedial  prefrontal  cortex.  In  summary,  comparing  HHT  and  HAT  study  1  provided  the  first  neural 
evidence  showing  how  automation  bias  mediates  these  types  of  trust,  thus  leading  to  behavioral 
differences  in  the  context  of  advice  taking. 

Aim  #2:  Neural  signatures  of  HAT  based  on  unreliable  human-automation  interactions  due  to  high  false 
alarm  rates.  In  study  2,  participants  completed  the  X-ray  luggage-screening  task  by  either  rejecting  or 
accepting  bad  or  good  advice  from  either  a  machine  or  human  inspector  with  a  manipulated  reliability  of 
60%  (false  alarm  rate).  Unreliable  advice  decreased  performance  overall.  HHT  was  lower  than  HAT  during 
bad  advice,  presumably  due  to  reevaluation  of  expectations  arising  from  association  of  dispositional 
credibility  for  each  agent.  Trust  differences  engaged  brain  regions  associated  with  the  mentalizing  network 
for  evaluating  personal  characteristics  and  traits  (precuneus,  posterior  cingulate  cortex,  temporoparietal 
junction)  and  the  salience  network  for  interoception  (posterior  insula).  Posterior  insula  and  left  precuneus 
were  the  drivers  of  the  HHT  network  that  were  reciprocally  connected  to  each  other  and  also  projected  to 
all  other  regions.  In  summary,  study  2  revealed  insights  into  the  neural  underpinnings  of  HAT  and  HHT 
associated  with  unreliable  advice  utilization  due  to  high  false  alarm  rates. 

Aim  #3:  Neural  Signatures  of  HAT  based  on  unreliable  human-automation  interactions  due  to  high  miss 
rates  (60%).  In  study  3,  participants  performed  the  X-ray  luggage-screening  task  by  either  accepting  or 
rejecting  good  or  bad  advice  from  either  a  human  or  a  machine  inspector  with  a  manipulated  reliability 
60%  (miss  rate)  of.  HAT  decreased  more  than  HAT  over  time,  possibly  due  to  high  expectations  of  reliable 
advice  from  a  machine  and  changes  in  attention  allocation  due  to  miss  errors.  Brain  areas  involved  with  the 
salience  and  mentalizing  networks,  as  well  as  sensory  processing  involved  with  attention  were  less  active 
for  HAT  as  for  HHT.  The  HAT  network  consisted  of  attentional  modulation  of  sensory  information  with  the 
lingual  gyrus  as  the  driver  during  the  decision  phase  and  the  fusiform  gyrus  as  the  driver  during  the 
feedback  phase  of  the  task.  In  summary,  study  3  expanded  on  the  existing  literature  by  showing  how 
misses  degrade  HAT  in  comparison  to  HHT,  which  is  represented  in  brain  regions  involved  in  salience 
detection  and  self-processing  with  perceptual  integration. 

The  performed  studies  are  innovative,  because  they  were  among  the  first  directly  to  examine  and  compare 
the  neural  signatures  of  HAT  (and  its  relationship  to  HHT)  in  the  context  of  human-automation  performance 
applying  a  multi-disciplinary  approach.  The  findings  have  significant  implications  for  society  because  of 
progressions  in  technology  and  increased  interactions  with  machines.  Moreover,  those  findings  are 
relevant  to  the  Air  Force  Office  of  Scientific  Research's  mission  aimed  at  fostering  innovative  research  and 
enhancing  the  Air  Force's  impact  on  policies  and  operations  related  to  national  security  by  investing  in  the 
discovery  of  the  foundational  concepts  of  trust  building  and  trust  calibration  during  complex  human- 
machine  interactions.  Overall,  the  successful  completion  of  this  project  resulted  in  two  substantive  project 
outcomes:  first,  a  significant  increase  in  our  knowledge  about  the  underlying  neural  circuits  of  HAT 
calibration  during  complex  human-automation  interactions  and  second,  the  laboratory  results  provide  a 
methodology  and  rationale  for  exploring  HAT  in  field  research  and  for  developing  transformative  novel 
theories  and  models. 
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Findings  of  study  1  were  submitted  as  an  abstract  to  the  21st  Annual  Meeting  of  the  Cognitive 
Neuroscience  Society  (Boston,  MA;  April  5-8,  2014): 
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The  research  effort  for  this  project  culminated  in  the  production  of  one  dissertation.  In  April  2006,  Kimberly 
S.  Goodyear  will  defend  her  dissertation  entitled  'The  neural  basis  of  advice  utilization  During  human  and 
machine  agent  interactions"  to  the  graduate  faculty  of  George  Mason  University  in  partial  fulfillment  of  the 
requirements  for  the  degree  of  Doctor  of  Philosophy  Neuroscience.  The  dissertation  includes  the  findings 
from  study  1  and  study  2  (see  attachment).  The  PI  of  the  research  project  will  act  as  the  Dissertation 
Director. 

Moreover,  a  manuscript  entitled  "Advice  utilization  during  human  and  machine  interactions:  an  fMRI  and 
effective  connectivity  study"  based  on  the  findings  of  study  2  is  currently  under  review  as  an  original 
research  article  in  the  journal  "Frontiers  in  Human  Neuroscience": 

Authors:  Kimberly  Goodyear,  Raja  Parasuraman,  Sergey  Chernyak,  Poornima  Madhavan,  Gopikrishna 
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conception  of  the  design.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  contributed  to  interpretation  of  the  data.  K.G., 
R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  contributed  to  drafting  of  the  work  and  revising  it  critically.  K.G.,  R.P.,  S.C., 

P.M.,  G.D.  and  F.K.  approved  the  final  version  to  be  published.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  agreed 
to  be  accountable  for  all  aspects  of  the  work. 

Abstract:  With  new  technological  advances,  advice  can  come  from  different  sources  such  as  machines  or 
humans,  but  how  individuals  respond  to  such  advice  and  the  neural  correlates  involved  need  to  be  better 
understood.  We  combined  functional  MRI  and  multivariate  Granger  causality  analysis  with  an  X-ray 
luggage-screening  task  to  investigate  the  neural  basis  and  corresponding  effective  connectivity  involved 
with  advice  utilization  from  agents  framed  as  experts.  Participants  were  asked  to  accept  or  reject  good  or 
bad  advice  from  a  human  or  machine  agent  with  manipulated  reliability  (high  false  alarm  rate).  We  showed 


that  unreliable  advice  decreased  performance  overall  and  participants  interacting  with  the  human  agent 
had  a  greater  depreciation  of  advice  utilization  during  bad  advice.  These  differences  in  advice  utilization 
can  be  due  to  reevaluation  of  expectations  arising  from  association  of  dispositional  credibility  for  each 
agent.  We  demonstrated  that  differences  in  advice  utilization  engaged  brain  regions  associated  with 
evaluation  of  personal  characteristics  and  traits  (precuneus,  posterior  cingulate  cortex,  temporoparietal 
junction)  and  interoception  (posterior  insula).  We  found  that  the  right  posterior  insula  and  left  precuneus 
were  the  drivers  of  the  advice  utilization  network  that  were  reciprocally  connected  to  each  other  and  also 
projected  to  all  other  regions.  Our  behavioral  and  neuroimaging  results  have  significant  implications  for 
society  because  of  progressions  in  technology  and  increased  interactions  with  machines. 

Finally,  another  manuscript  entitled  "An  fMRI  and  effective  connectivity  study  investigating  miss  errors 
during  advice  utilization  from  human  and  machine  agents"  based  on  the  findings  of  study  3  is  currently 
under  review  as  an  original  research  article  in  the  journal  "Social  Neuroscience": 

Authors:  Kimberly  Goodyear,  Raja  Parasuraman,  Sergey  Chernyak,  Ewart  de  Visser,  Poornima  Madhavan, 
Gopikrishna  Deshpande,  Frank  Krueger 

Author  Contributions:  K.G.  and  S.C.  acquired  the  data  for  analysis.  K.G.,  R.P.  and  F.K.  contributed  to  the 
conception  of  the  design.  K.G.,  R.P.,  S.C.,  P.M.,  G.D.  and  F.K.  contributed  to  interpretation  of  the  data.  K.G., 

R. P.,  S.C.,  E.D.V.,  P.M.,  G.D.  and  F.K.  contributed  to  drafting  of  the  work  and  revising  it  critically.  K.G.,  R.P., 

S. C.,  E.D.V.,  P.M.,  G.D.  and  F.K.  approved  the  final  version  to  be  published.  K.G.,  R.P.,  S.C.,  E.D.V.,  P.M., 
G.D.  and  F.K.  agreed  to  be  accountable  for  all  aspects  of  the  work. 

Abstract.  As  society  becomes  more  reliant  on  machines  and  automation,  understanding  how  people  utilize 
advice  is  a  necessary  endeavor.  Our  objective  was  to  reveal  the  underlying  neural  mechanisms  during 
advice  utilization  from  expert  human  and  machine  agents  with  fMRI  and  multivariate  Granger  causality 
analysis.  During  an  X-ray  luggage-screening  task,  participants  accepted  or  rejected  good  or  bad  advice 
from  either  the  human  or  machine  agent  framed  as  experts  with  manipulated  reliability  (high  miss  rate).  We 
showed  that  the  machine-agent  group  decreased  their  advice  utilization  compared  to  the  human-agent 
group  and  these  differences  in  behaviors  during  advice  utilization  could  be  accounted  for  by  high 
expectations  of  reliable  advice  and  changes  in  attention  allocation  due  to  miss  errors.  Brain  areas  involved 
with  the  salience  and  mentalizing  networks,  as  well  as  sensory  processing  involved  with  attention,  were 
recruited  during  the  task  and  the  advice  utilization  network  consisted  of  attentional  modulation  of  sensory 
information  with  the  lingual  gyrus  as  the  driver  during  the  decision  phase  and  the  fusiform  gyrus  as  the 
driver  during  the  feedback  phase.  Our  findings  expand  on  the  existing  literature  by  showing  that  misses 
degrade  advice  utilization,  which  is  represented  in  a  neural  network  involving  salience  detection  and  self¬ 
processing  with  perceptual  integration. 

Changes  in  research  objectives  (if  any): 

None 

Change  in  AFOSR  Program  Manager,  if  any: 

Dr.  Benjamin  Knott  replaced  Dr.  Joseph  Lyons  on  August  1  st,  2013  as  the  Program  Officer  for  the  Trust  and 
Influence  portfolio. 

Extensions  granted  or  miiestones  siipped,  if  any: 

None 
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